THE GAUSSIAN PRIMES CONTAIN ARBITRARILY SHAPED 

CONSTELLATIONS 
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Abstract. We show that the Gaussian primes P[i] C Z[i] contain infinitely 
constellations of any prescribed shape and orientation. More precisely, given 
any distinct Gaussian integers vq, . . . , "Ufc _ i , we show that there are infinitely 
many sets {a + rvo, ... , a + rv^—i}, with a S Z[i] and r G Z\{0}, all of whose 
elements are Gaussian primes. 

The proof is modeled on that in [9] and requires three ingredients. The first 
is a hypergraph removal lemma of Gowers and Rodl-Skokan, or more precisely 
a slight strengthening of this lemma which can be found in [22]; this hyper- 
graph removal lemma can be thought of as a generalization of the Szemeredi- 
Furstenberg-Katznelson theorem concerning multidimensional arithmetic pro- 
gressions. The second ingredient is the transference argument from [9], which 
allows one to extend this hypergraph removal lemma to a relative version, 
weighted by a pseudorandom measure. The third ingredient is a Goldston- 
Yildirim type analysis for the Gaussian integers, similar to that in [9], which 
yields a pseudorandom measure which is concentrated on Gaussian "almost 
primes" . 



1. Introduction 



A famous and deep theorem of Szemeredi [19] asserts that any set of integers of 
positive upper density contains arbitrarily long arithmetic progressions. This the- 
orem was extended by Furstenberg and Katznelson [2] to higher dimensions, as 
follows. If Z is an additive group, we define a shape in Z to be a finite collection 
{vj)j£j G Z"^ of distinct elements in Z. A constellation in Z with this shape is 
defined to be any J-tuple of the form (a + rvj)j(zj e Z'\ where a £ Z and r e Z, 
with all of the a + rvj being distinct. Note that we can define the product of an 
integer r e Z with an additive group element v & Z in the usual manner. 



Theorem 1.1 (Multidimensional Szcmcrcdi's theorem, combinatorial version). [2] 

Let d > \, and let A 
strictly positive, thus 



Let d > 1, and let A be a subset of the lattice Z'' whose upper Banach density is 



|An hiv,7V]^| ^ 

hmsup — -7 7-3 — > 0, 

where [—N,N] := {n € Z : — < n < N} and \A\ denotes the cardinality of 
A. Then for any given shape {vj)j^j in Z*^, the set A contains infinitely many 
constellations {a + rvj)j^j with that shape. 
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Now consider the Gaussian primes P[i\ in the Gaussian integers Z[i] := {a + bi : 
a,b Q Z}, defined as those Gaussian integers p which have no proper factors (other 
than units 1, z, — 1, — i and associates p, ip, —p, ~ip). One can identify Z[i] with Z 
in the obvious manner, however when one docs so, the upper Banach density of 
P{i\ is zero and so Theorem 1.1 does not directly apply. Nevertheless, we are able 
to establish the following result, which is the main result of this paper. 

Theorem 1.2 (Constellations in the Gaussian primes). Let {vj)j^j be any shape 
in the Gaussian integers Z[i]. Then the Gaussian primes P[i] contains infinitely 
many constellations with this shape. 

Theorem 1.2 can be thought of as the Gaussian counterpart of the recent result 

in [9] that the rational primes P = {2, 3.5,...} contain arbitrarily long arithmetic 
progressions. The latter result is connected to the d = 1 case of Theorem 1.1, 
whereas the results here are connected to the d = 2 case. It is likely that the method 
also extends to cover some further results of this type, sec Section 12. We remark 
that the scaling parameter r can be chosen to be positive, by the rather crude 
expedient of replacing the constellation {vj)j^j with the symmetrized constellation 



Our approach to proving Theorem 1.2 basically follows the strategy of [9]. A direct 

execution of that strategy would proceed by somehow transferring Theorem 1.1 to a 
relative version, weighted by a pseudorandom measure. One would then construct a 
pseudorandom measure concentrated on the Gaussian "almost primes" to conclude 
the argument. It may well be possible to carry out this approach; however we 
have proceeded by a slightly different route, not working with Theorem 1.1 but 
a stronger result, which we call a "strong hypergraph removal lemma", which we 
shall discuss shortly. (We will, however, still need to construct a pseudorandom 
measure concentrated in Gaussian almost primes). 

Theorem 1.1 in the contrapositivc, implies in particular that any subset of Z'' 
which contains only finitely many constellations of a prescribed shape, must have 
density zero. A more quantitative version of this assertion is as follows. Given any 
finite non-empty set Z and any function / : Z ^ R, wc use E(/) = E(/|Z) = 
E(/(a;)|a; & 2) := J2xez fi^) ^'^ denote the average value of /. If x,yi,. . . , y„ 
are parameters and X > is a positive quantity, we use Ox^o;yi,... [X) to denote 
any quantity bounded in magnitude by c{x,yi,... ,y„)X, where c is a function 
which goes to zero as x ^ for each fixed choice of yi, ■ ■ . ,yn- Similarly we use 
^yi, - i-X) denote any quantity bounded in magnitude by C(yi, . . . , y„)X for 
some quantity C(t/i, . . . , y„) > 0. 

Theorem 1.3 (Multidimensional Szemeredi's theorem, expectation version). Let 

Z,Z' be two finite additive groups, and let {4>j)je.i ^6 o, finite collection of group 
homomorphisms 4>j : Z ^ Z' . Let A be a subset of Z' . If we have 
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for some < 6 < 1, then we have 

E(U(a;)|a;eZ') = O5^0;|J|(l)- 

This particular result does not appear explicitly in the literature, but it follows 
from the work of Furstenberg and Katznelson [2] in the cyclic case Z = Z/NZ, 
and from their later work [3] on a density version of the Hales-Jewett theorem for 
the general case. It also follows from the hypergraph analysis of Gowers [8] and 
Rodl-Skokan [14], [15], or more precisely from Theorem 1.7 below. It is easy to see 
that Theorem 1.3 implies Theorem 1.1, by localizing the situation in Theorem 1.1 
to a cyclic group such as Z% for a large prime N, and then letting N oo; we 
omit the standard details. 

The proof of Theorem 1.3 sketched above used methods from ergodic theory. At 
first glance, it seems that the additive structures of the groups Z and Z' must play 
a key role; for instance, in the ergodic arguments of [2], this structure is captured 
in the algebra of multiple commuting shifts on a probability space. However, it 
is a remarkable fact, observed by multiple authors, that Theorem 1.3 (and hence 
Theorem 1.1) can in fact be deduced from a stronger result - namely a "hypergraph 
removal lemma" - in which no additive structure is present. We shall state this 
stronger result (or more precisely, a refinement of this result in [22]) shortly, but 
first we need some notation. 

Definition 1.4 (Hypergraphs). If J is a finite set and d > 0, we define (^) := 
{e C J : |e| = d} to be the set of all subsets of J of cardinaHty d. A d-uniform 
hypergraph on J is then defined to be any subset H C (j^) of (^). 

Definition 1.5 (Hypergraph systems). A hypergraph system is a quadruplet V = 
(J, (V^)jgj, d, H), where J is a finite set, (V^)jeJ is a collection of finite non-empty 
sets indexed by J, d > 1 is positive integer, and H C (jj) is a d-uniform hypergraph. 
For any e C J, we set T4 := Ojej^' ^^'^ T^e ■ Vj ^ Ve be the canonical 
projection map. For each e € J, let Ae be the cr-algebra on Vj defined by Ae ■= 
{n-\E):ECVe}. 

Remark 1.6. Very roughly speaking, a hypergraph system corresponds to the notion 

of a measure-preserving system, in ergodic theory, though with the notable difference 
that no analogue of the shift operator exists in a hypergraph system. Indeed the 
Vj are simply finite sets, and need not have any additive structure whatsoever. 

Theorem 1.7 (Hypergraph removal lemma). [8], [12], [14], [15], [22] Let V = 
(J, {Vj)jej,d, H) be a hypergraph system. For each e G H, let Eg he a set in Ae 
such that 

H^iEM\x(^yj)<5 (1) 

for some < ^ < 1 . Then for each e G H there exists a set E'^ e Ae such that 

{^E'e = % 

and 

^ilE,\E'Sx)\x e Vj) = 05^0; j(l) for all 6 S H. 
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Furthermore, there exists sub-algebras Be> ^ Ae> whenever e' C J and \e'\ < d 
obeying the complexity estimate 

\^e' \ = whenever e' C J and \e'\ < d 

and 

E'^e y Be' for all e e H. 

e'Ce 

Here of course Ve'^e'^e' is the smallest a-algebra which contains B. 

Remarks 1.8. For this paper, we will only need this theorem in the special case 
when d = \ J\ — 1 and H is the simplex hypergraph H = and when all the 

Vj are equal to each other (in fact, they will all be set equal to a finite additive group 
Z). On the other hand, this special case docs not seem to be significantly easier 
to prove than the general case. The hypothesis (1) asserts that the sets [Ee)eeH 
(which can be thought of as families of edges in a partite hypergraph) contain very 
few copies of iJ; the hypergraph removal lemma then asserts that those copies of 
H can be removed by replacing the edge sets with slightly different edge sets E'^ 
with bounded complexity. For a more detailed discussion of this lemma, we refer 
the reader to the references given above. 

At first glance, Theorem 1.7 has nothing to do with Theorem 1.3. However, as 
observed in [16], [17], [18], [1], [8], [15], it is in fact relatively easy to deduce the 
former from the latter, and we include a proof below for the reader's convenience. 

Proof [of Theorem 1.3 assuming Theorem 1.7] Let us first make the "crgodic" 
hypothesis that the elements {(jjiir) — 4>j{r) : i,j G J;r ^ Z} generate Z' as an 
additive group; we will remove this hypothesis at the end of the argument. Let 
V = (J, {Vj)j^j,d, H) be the hypergraph system with Vj := Z, d := \ J\ — 1, and 
H := (^) . If e = J\{j} is an element of H, we define the set Eg CVj = Z-^ by 

Ee := {(a;,),6j S ■^c\),{xi) - c\>j{xi) G A}. 

Observe that the expression X^^gj (f)i{xi) — <pj{xi) does not actually depend on xj 
and so Ee & Ae- 

Now we compute the size of Ylg^H^^e- Let $ : Vj — > x Z be the group 
homomorphism 

jeJ jeJ 

then we see from the definitions that 

f]Ee= $-^({(a,r) €Z' xZ:a + <pj{r) € A for all j}). (2) 

eeH 

Consider the image of the group homomorphism $. This image contains all points 
of the form {(j)i{r) — (j)j{r),0) for i,j G J and r & Z, and hence contains Z' x {0} 
by hypothesis. It also contains all elements of the form {—(f)i{r),r) for any r G Z 
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and i ^ J. Hence the image must be all of Z' x Z; since $ is a homomorphism, all 
the fibers ^~^{x,r) thus have the same cardinality. We conclude 

E( n ^eM\x e Vj) = E([] 1a{x + eZ';r€Z)<6 

e<£H jeJ 

by hypothesis. Applying Theorem 1.7, wc can find E'^ £ Ae such that 
and 

\E,\E'J = os^o.\j\{\Vj\) for all e e H. 

Wc have additional information on the "complexity" of E'^ but we will not need it 
for this argument. 

Next, from (2) we see in particular that 

^-\A X {0}) C fl E,; 

eeH 

since HeeH -^e = ^> '^^ conclude that 

<^-\Ax{0})C \J{E,\E'J. 

Thus by the pigeonhole principle there exists an e = J\{j} such that 

\{EAE'J n <^-\A X {0})| > l""'^^^]^^"^^! . 

The set ^^^{A X {0}) lives in the hyperplanc : J2ieJ^'i ~ ^^"^^ 

particular the projection map iTe -Vj —^ Ve, which has multiplicity \Z\ everywhere, 
is injective on (l>~^{A x {0}). Hence we have 

m\Ei) n X {0})| < = o,^o;|.|(^). 

Since $ is a surjective group homomorphism from Vj to Z' x Z, we have 

l^-^(^x{0})| ^ 1^1 ^ ±\Al 
\Vj\ \Z'xZ\ \Z\\Z'[ 

Combining these inequalities we obtain \A\ = os^o-^\j\{\Z'\) as claimed. 

To remove the ergodic hypothesis, we let G be the subgroup of Z' generated by 
the elements {4>i{r) — (t>j{r) ■ i,j & J;r G Z}. We foliate Z' into |Z'|/|G| cosets of 
G. An easy counting argument shows that on all but 0(-\/^|.Z'|/|G|) of these cosets 
y + G, we have 

E ^1a{x + ^j{r))\x & y + G;r G z"^ <Vd. 

Applying the previous argument to each of these cosets, we conclude 

|^n(y + G)|=(,^^o^l^|(|G|) 
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for each of these cosets. Adding up the contributions for ah of these cosets, as weh 
as the 0{'/5\Z'\/\G\) exceptional cosets, we obtain |^| = o^^o;] J| (l-^'l) as claimed. 
■ 

Remark 1.9. Note in the above proof we did not need the complexity bounds on 
E'^. However, this fact will be important for us when we transfer this result to a 
weighted setting below. The point is that the pseudorandom weight function which 
we will introduce will be uniformly distributed with respect to lower order sets but 
not with respect to arbitrary sets. 

Our proof of the number- theoretic results of this paper, and in particular Theorem 
1.2, proceeds by a three-stage process similar to that in [9]. Firstly, we apply 
the transference philosophy from [9] to extend Theorem 1.7 to a relative version 
of that theorem, weighted by a pseudorandom system {ue)e<^H of measures; this 
shall be done by following the arguments in [9] closely, the main observation being 
that those arguments did not significantly rely on any additive structure in the 
underlying system and thus generalize from the ergo die system Zjv to an arbitrary 
hypergraph system without any fundamental new difficulties. Next, by repeating 
the deduction of Theorem 1.3 from Theorem 1.7, we obtain a relative version of 
Theorem 1.3, in which the set A is measured with respect to a pseudorandom 
measure z/; this step of the argument is quite easy. Finally, we apply this relative 
version of Theorem 1.3 to the Gaussian primes by constructing a psuedorandom 
majorant for these primes in the spirit of the work of Goldston and Yildirim (with 
some additional simplifications introduced in [24]). 

One additional technical complication which appears in this work is that the Gauss- 
ian primes (or almost primes) contain certain correlations which are not present in 
the rational case. In particular, the Gaussian (almost) primes have a different 
density on lines such as the real line, than they do on all of Z[z]. Also, there is 
an obvious correlation between p being a Gaussian (almost) prime and p being a 
Gaussian (almost) prime. We shall eliminate the first type of correlation by ex- 
cluding the "exceptional" Gaussian primes whose norm is not a rational prime. 
The second type of correlation cannot be eliminated so easily, but fortunately its 
contributions to the error terms are ultimately manageable. 

The author is supported by a grant from the Packard foundation. The author also 
thanks Timothy Gowers and Ben Green for some helpful conversations. 

2. PSEUDORANDOMNESS 

Before we can state our relative versions of Theorem 1.7 and Theorem 1.3, we must 
introduce the notion of a pseudorandom, system of measures {ve)eeH on a hyper- 
graph system V = (J, (Vj)jgj, d, H). Strictly speaking, the concept of pseudoran- 
domness will not be associated with a single system of measures on a hypergraph 
system, but rather on a one-parameter family of measures {ve)eeH = {vi'^^)eeH 
on a hypergraph system V = V^^\ where N ranges over a sequence of numbers 
tending to infinity (e.g. N could range over the primes). This is in order to make 
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sense of error terms such as OAr^oo(l)- The concept of a pseudorandom system is 
closely analogous to that of a pseudorandom measure in [9], where the hypergraph 
system was replaced by the ergodic system Z/NZ. 

In the rest of this paper, wc fix the finite set J and the index d, as well as the 
hypergraph H C (^) ; in particular, these objects will not depend on the parameter 
N. We will allow all implicit constants in the 0() and o() notation to depend on J, 
d, and H; indeed, since for any fixed ,/ there arc only finitely many possible values 
of d and H, this is the same as requiring all implicit constants to depend on J. 

Definition 2.1 (System of measures). We define a system of measures {ve)eeH to 
be a hypergraph system V = V^^^ = (J, {y-^^)j^j, d, H) depending on a parameter 
N (ranging over a sequence of numbers tending to infinity), together with a collec- 
tion of non- negative functions Ve = i^i^^ '■ V'i^' — > R"*", obeying the normalization 
condition 

l:,{iye{Xe)\Xe S K) 1 + OAr^co(l). (3) 

Wc will usually suppress the dependence of V and {i^e)eeH on the parameter N. 

Example 2.2. One could set vj^'' = Z/NZ, and let : {Z/NZf ^ R+ &e a 
random function such that for each x G {Z/NZy, v^i^) = logN with independent 

probability 1/logA'^, and Veix) = otherwise. Then with high probability, {i'e)eeH 
will be a system of measures, and it will also with high probability satisfy the pseu- 
dorandomness conditions we shall give shortly. For a more sophisticated example, 
see Example 2.12 below. 

Remark 2.3. Note wc do not require that be bounded, or even that it obey any 
sort of L'P type moment condition (for instance, ^{i'e{xe)'^\xe G K) need not be 
bounded). Indeed, for applications to number theory (or indeed to any application 
involving sets of Banach density zero) it is vital that wc allow these moments to be 
unbounded. However, we shall shortly require that various correlations involving 
the Ve be bounded. 

The condition (3) is not strong enough by itself for our applications, and we must 
supplement it with three conditions, the dual function condition, the linear forms 
condition and the correlation condition. These closely mimic the conditions of the 
same name in [9], (where the dual function condition and linear forms condition 
were combined into a single (affine-)linear forms condition), though there are some 
minor technical differences. 

Definition 2.4 (Discrete cube). If e is a finite set, we let {0, 1}'' be the set of all 
binary e-tuples lo = (cijj)jge where each LUj is cither or I. Observe that {0, 1}'' 
contains in particular the zero e-tuple 0"^ := (O)jge and the one e-tuple 1"^ :— (f)jee- 
If x^j^ = {xj^^)j^j and x^J^ = {x^p)j^j are two elements of Vj, e is a subset of J, 
and w e {0, 1}^ is a binary e-tuple, we define x^e'' € 14 to be the element 

:= {xp^),ee- 

We abbreviate x^e ^ as x^e^ , thus 

x^ ^ •= i^^j ^)jee 
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and define xi'^ similarly. 

Definition 2.5 (Dual function). Let V = (J, {Vj)j^j,d,H) be a iiypergraph sys- 
tem, and let e G H. If / : — > R is a function, we define its dual function 
T^ef -Ve^Hhy the formula 

Pe/(a;f):=E( n e K) (4) 

w6{0,l} = :a;/0'' 

for all xi"^ e K. 

Example 2.6. If e = {1,2}, then 

I?{i,2}(/)(.Ti,.X2) =E(/(xi,4)/K,a;2)/(a;'i,4)|2;'i e G ^2). 

T/ie dual functions will be an indispensable tool in our analysis of the Gowers cube 
norms ||/e||ne, which we shall introduce later and which will play a pivotal role in 
our arguments. 

Definition 2.7 (Dual function condition). A system of measures {ve)eeH on the 
hypergraph system V = (J, {Vj)j^j, d, H) is said to obey the dual function condition 
if one has the pointwise estimate 

Ce(^^e + l)(4°^) = 0(l) 

for all e e if and x^e^ G V^. 

Definition 2.8 (Linear forms condition). A system of measures {i'e)eGH on the 
hypergraph system V = (J, {Vj)j^j, d, H) is said to obey the linear forms condition 
if one has 

E(n n ^e(x(-))"-|4°\4')GK/) = l + 0,v^oo(l) (5) 
eeHw6{0,l} = 

for any choice of exponents n^^uj S {0, 1}. 

Example 2.9. If J ^ {1,2,3}, d = 2, and H = (2), then (5) asseHs that 

ij=12, 23,31 

\xi,x\ e Vi,X2,x'2 e V2,X3,x'3 G V3) = 1 + Ojv^cx) (1), 

and similarly if one or more of the twelve factors of v in the expectation is deleted. 

Remark 2.10. The condition (5) can be viewed as a fairly strong assertion of in- 
dependence between the quantities Ug^xi^"^); they in particular imply that each 
weight z^e is pseudorandom in the sense of [11], [8] but are significantly stronger 
than those bounds alone. Note that most instances of (5) are coupled expressions 
which involve several of the in some entangled way; it may be possible to use 
multiple applications of the Cauchy-Schwarz inequality to replace these conditions 
by "pure" pseudorandomness conditions involving each of the Vg separately, but we 
have not sought to do so here. 

Remark 2.11. Note that the linear forms condition (5) implies (3) as a special case 
(when all but one of the exponents ne,w is set to zero). However we have chosen to 
isolate (3) for expository reasons, to emphasize the normalized nature of the Vg- 
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Example 2.12. A model instance of a pseudorandom system of measures, of rel- 
evance to number theory, is as follows. Let J = {1,... ,k}, let d = k — 1, and 
H =Q. Let N he a very large integer, and let w = w(N) be a moderately large in- 
teger growing slowly with N (so 1/w = on^oo{^))- Let W = Y[p<wP product 
of the rational primes less than w, and let h\,... ,bk be integers in {0,. . . ,W —1} 
such that X]i<j<fe(i — j)bi is coprime to W for each 1 < j < k. For each j G J, let 
Vj be the set 

Vj := {Wn + bj ■.l<n<N} 
and for each e = J\{j}, let :Ve ^ he the function 

where (l>{W) is the Euler totient function of W and A is the von Mangoldt func- 
tion. Then, assuming a certain strong form of the Hardy-Littlewood prime tuples 
conjecture, this system of measures will obey the linear forms condition if w is a 
sufficiently slowly growing function of N. Of course, to verify the prime tuples con- 
jecture is considered to be impossible by current technology; however, by modifying 
the arguments in [9] one can replace the normalized von Mangoldt function ^^^A 
by a slightly larger pseudorandom function v (essentially a truncated divisor sum 
of Goldston-Yildinm type) for which these types of conditions can be much more 
easily verified. See [9]. 



In addition to controlling dual functions and linear form expectations, we will also 
need to control correlations (involving only a single measure Vg) in which both ver- 
tices xf'\x^^'' from a vertex set Vi are fixed; this quantity then measures some sort 
of pair correlation between -^g^jj} and ■ For such expressions one cannot ex- 

pect a uniform bound such as l + OAr^oo(l) or even 0(1), because the diagonal case 
~ ^e\{j} ^^^^ almost certainly have an abnormally large (and unbounded) 
correlation. In number theoretic applications (such as Example 2.12), there are a 
few other cases where the correlation is expected to be abnormally large, notably 
when 5^jgp\{j} x^i^^ — x^p has an extremely large number of small prime factors. 
These correlations can become unbounded (thanks to the divergence of the Euler 
product ripCl ~ p)"^)- However, the correlations will still be bounded on the av- 
erage, and even have bounded moments of any given order. More precisely, we 
have 

Definition 2.13 (Correlation condition). A system of measures (i'e)eei? on the 
hypergraph system V = (J, {Vj)j^j,d,H) is said to obey the correlation condition 
if we have 



e|e( n J^e{xi-Y^-\xf\x'-l^ GVj f 



(6) 



for every e € H, j G e, any choice of exponents Ue^ui & {0,1}, and any integer 
K>0. 
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Definition 2.14 (Pseudorandom system). A system of measures {i'e)eeH on the 
hypergraph system V = (J, {Vj)j^j, d, H) is said to be pseudorandom if it obeys the 
dual function condition, the linear forms condition and the correlation condition. 

The system (l)ee-f/ is a rather trivial example of a pseudorandom system of mea- 
sures. More generally, we have the following simple but handy lemma that says 
that the arithmetic mean of a pseudorandom system (z^e)eeH with (l)eeH is also 
pseudorandom: 

Lemma 2.15. Let V = (J, {Vj)j^j,d, H) be a hypergraph system, and let (t'e)eeJ? 

be a system, of pseudorandom, measures. Then + ■^i'e)eeH is also a system of 
pseudorandom measures (perhaps with slightly different constants in the o() and 
0() notations). 

Proof This is a reprise of [9, Lemma 5.2]. The dual function condition follows 
from the pointwise estimate 

Pe(i + \l^e + 1) < I?e(^(^e + 1)) = (^f-'V^i^e + 1). 

As for the linear forms and correlation conditions, from the binomial formula we 
have 

eeHa>e{0,l}« 

= E(JJ JJ i/e(4"'^)"^-™«-|me,a,e{0,l}foralleefl",a;e{0,in 

and the claim follows by linearity of expectation. ■ 

In [9], Szemercdi's theorem was extended via a "transference principle" to a relative 
version, weighted with a pseudorandom measure. In this paper we shall apply the 
same transference principle to extend Theorem 1.7 to a relative version, which we 
state as follows. 

Theorem 2.16 (Relative hypergraph removal lemma). Let V = (J, {Vj)j^j,d,H) 
be a hypergraph system, and let {i'e)e£H be a system of pseudorandom measures. 
For each e G H, let be a set in Ae such that 

'E{Y[1eMMM^))\^(^Vj)<S (7) 

for some < 6 < 1. Then, if N is sufficiently large depending on 5 and J, for each 
e G H there exists a set E'^ e Ae such that 

and 

T^{^E,\E'Jx)iye{TTcix))\x e Vj) = 05^0(1) + ow^oo;5(l) .for all eeH. (8) 

Recall that all constants are allowed to depend on J. Furthermore, there exists a 
a-algebra Bg' C Ae' for all e' with \e'\ < d such that 

\Be'\ = Os{l) whenever e' C J and \e'\ < d 
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and 

E'^G y Be' for all e&H. 

e'Ce 

The proof of Theorem 2.16 is lengthy and shall occupy Sections 3-7. Just as The- 
orem 1.7 implies Theorem 1.3, Theorem 2.16 implies the following relative version 
of Theorem 1.3. 

Theorem 2.17 (Relative multidimensional Szcmcrcdi's theorem). Let Z, Z' he two 
finite additive groups, and let {<j)j)j^,j be a finite collection of group homomorphisms 
: Z ^ Z' he any group homomorphisms from, Z to Z' . We assume the ergodic 
hypothesis that the elements {(t>i{i^) ^ 4'ji''') ■ hj & J,r € Z} generate Z' as an 
abelian group. Let z/ : Z' ^ R+ he a non-negative function, with the property that 
in the hypergraph system (J, {Vj)j^j,d, H) with Vj := Z, d := | J| — 1, H := (^), 
the collection {i'e)eeH defined hy 

''J\{j}iixi)iej) ■■= - <Pj{xi)) 

ieJ 

is a pseudorandom family of measures. Then if A is a subset of Z' such that 

E( Jl lAix + (pjir))iyix + (/)j-(r))|a; €Z';r€Z)<S (9) 
jeJ 

for some < ^ < 1, then we have 

F,{lA{x)iy{x)\x e Z') = O5^0;|J|(l) + On^oo;sW- 

Proof [of Theorem 2.17 assuming Theorem 2.16] This shall be a reprise of the proof 
of Theorem 1.3. We may assume A'' is large since the claim is trivial otherwise. As 
in that proof, we define the set Ef. e Ae for each element e = J\{j} of H as 

Ee := {{xi)iej e Z'^ : ^</)i(a;i) - (t)j{xi) e A}, 

ie.J 

and we recall the group homomorphism : Vj —f Z' x Z defined by 

^{{xi)i^j) := {Y^(l)i{xj),-J2xj). 

jeJ jeJ 

Then we have 

n lEAx)M^e{x)) = n lA(a + (t>j{r)Ma + ^j{r)) 

for all X e Vj, where (a, r) := $(a;). From the ergodic hypothesis, $ is surjective, 
and hence all the fibers $~^(a,r) have the same cardinality. Thus 

lEAxK{'JTe{x))\x e Vj) = E(J] 1^(0 + (r))z/(a + 0,- (r))) < 5 
eeH jeJ 

by hypothesis. Applying Theorem 2.16 (for A'' large enough), we can find E'^ G Ae 
such that 

n^e = (10) 
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and 

E(l£;,\_E' {x)iye{T^e{x))\x G Vj) = 05^o;|J|(l) + OAr^oo;5,| J| (1) for all 6 G iJ. 

(11) 

Once again, we will not need to use the additional complexity information on E'^. 

Next, we observe from the definition of that 

lA(a)lr=0 = lr=0 ^E^^) 

for all X e Vj, where (a, r) = ^{x) as before. From (10) we conclude that 

Multiplying by v{a), averaging in x, and then applying the pigeonhole principle, 
there exists an e = J\{j} in H such that 

^E{lA{a)lr=oi^{a)\x e Vj) < E{lr=o'^{a)lE,\E'{x)\x e Vj). 
\J\ 

Observe that u{a) = v'e{'^e{x))- Also, recall that the fibers $~^(a,r) all have equal 
cardinality. Thus we have 

-^E{lA{a)lr=oi^{a)\{a,r) G Z' x Z) < E{lr=oMMx))'i-E.\E' {x)\x G Vj). 
\J\ 

Since the function Ve{T^e{x))lE^\E' (x) does not depend on the Xj variable, and that 
the constraint r = forces xj to be determined by all the other variables, we have 

E{lr=ol^e{Tre{x))lE,\E'Sx)\x G Vj) = -^^E{l^e{T^e{x))l E,\E'Sx)\x € Vj) . 

Also, we have E(lA(a)lr=oz^(a)|(a,r) € Z' x Z) = j^F,{lAia)iy{a)\a € Z'). Thus 

E{lA{a)u{a)\a G Z') < \ J\E{ye{'Ke{x))lE,\E'Sx)\x e Vj) 
and the claim follows from (11). ■ 

Remarks 2.18. The complexity bound was not used in this argument, however we 
will need the compk;xity bound from Theorem 1.7 in ordc;r to successfully transfer 
that theorem to the relative setting. The ergodic hypothesis can be dropped by 
foliating Z' into cosets as in the proof of Theorem 1.3, but one has to modify the 
pseudorandomness hypotheses on u accordingly; we omit the details. 

Remark 2.19. Theorem 2.17 can be used to prove a slight variant of the relative Sze- 
meredi theorem in [9, Theorem 3.5] (with some minor variations in the linear forms 
and correlation condition). This is unsurprising given that the proof of Theorem 
2.17 given here closely follows the proof of that theorem in [9]. 

In the next few sections we shall prove Theorem 2.16, and hence Theorem 2.17. In 
the second half of the paper (from Section 8 onwards) we shall apply Theorem 2.17 
to questions concerning the primes and Gaussian primes. 
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3. The Gowers cube norm, and overview of proof of Theorem 2.16 

In this section we shall recall the Gowers cube norm ||/||n' . which shall be a funda- 
mental tool in our proof of Theorem 2.16, playing a role closely analogous to that 
of the Gowers uniformity norm ||/||(7d in [9]. We will then use this norm to split the 
proof of Theorem 2.16 into four components. One component is a weighted version 
(Theorem 3.7) of the hypergraph removal lemma, which is a minor generalization 
of Theorem 1.7. Another component will be a generalized von Neumann theorem 
(Theorem 3.8), which essentially asserts that functions with small cube norm have 
a negligible impact on the quantity (7). A third component is a structure theo- 
rem, which decomposes the function Ie^i^e into a bounded non- negative function 
(which can be dealt with using Theorem 1.7) and a remainder with small cube 
norm (which can be dealt with using Theorem 3.8), plus a negligible error. Finally 
(and this is where we need the complexity information from Theorem 1.7), we need 
a simple result (Corollary 3.6) which asserts that functions with small cube norm 
are uniformly distributed with respect to lower order sets. 

We now turn to the details. We begin by defining the Gowers cube norm. 

Definition 3.1 (Gowers cube norm). Let V = (J, {Vj)j^,j,d, H) be a hypergraph 
system, let e be an element of H, and let / : 14 ^ R- be a function. We define the 
Gowers cube norm ||/||ne of / to be the quantity 

||/||oe:=E( n /(4-))|x(°),xWeye)^/^"'. 

a>e{0,l}«= 

Examples 3.2. If e is empty, e = 0, then Ve is a singleton set, and, ||/|ln<! is 
simply equal to the single value of f on V^; in particular ||/||q« can be negative in 
this case. If e is a point, thus e = {j}, then 

ll/llne = E(/(40))/(a;«)|40),a;W e V,f'^ = |E(/(a;)|a; e V^)\. 

In particular, the D*^ "norm" is only a semi-norm in this case. If e consists of two 

points, thus e = {i,j}, then 

||/||o. =E(/(x^°))/(x(°-i))/(x(i'°))/(x(i'i))l4°),x(i) e K)i/4 

= ^{.f{x^,XJ)f{x^,x'^),f{x'^,XJ)f{x'^,x'j)\xi,xl G Vi;Xj,Xj e V,)^/^ 

= E(E(/(.T„a;,)/(.T'„.T,)|x, G Vjf\x,,x'^ e V^y/\ 

Thus ||/||n<! is non-negative (and one can easily verify that it vanishes if and only if 
f is identically zero). In this case one can view f as the kernel of a linear operator 
T from Vi to Vj, and WfWn' can be viewed as square root of the normalized Hilbert- 
Schmidt norm of T*T, or as the 4-Schatten norm 

tr(rT*TT*)i/4. Alternatively, 
one can view f as a weighted bipartite graph from Vi to Vj, and then ||/||oe is a 
normalized count of the 4- cycles in this graph, weighted by f. 

Example 3.3. Suppose Vj = Z for some abelian group, e & H, and / : Ve ^ R 
has the special form 

fiixj)j^,) = FiJ2xj) 

j<£e 

for some function F : Z ^ R. Then ||/||n« = ||-F||{7<i(z)) where d = |e| and the 
U'^{Z) norm is the Gowers uniformity norm, defined for instance in [7], [9], [21]. 
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Remark 3.4. The D*^ norm is closely related to the concept of a dual function 
Ve{f), see (22) below. Indeed, just as in [9], the complementarity between Gowers 
uniform functions - that is, functions with small norm - and Gowers anti- uniform 
functions (specifically, functions generated by dual functions) will lie at the heart 
of the transference principle that underlies this paper. 

If |e| > 1, and we split e = e' U {j} for an arbitrary j € e, where e' := e'\{j}, then 
one can verify the identity 

||/||oe=E(( n /(4'\^.)k.eF.-fk?\^l^^GK')^/'" (12) 
we{o,i}e' 

and thus ||/||n<! is non-negative. One can also verify that obeys the triangle 
inequality when |e| > 1 (see e.g. [23]) but we will not need this fact here. One 
further consequence of the identity (12) is that 

\\f9\W < Wfh' 

whenever g : V,, ^ [—1,1] is a. bounded function which is independent of the Xj 
variable for some j S e. In particular, g can be a indicator function. Iterating this 

claim, we obtain 

Corollary 3.5. Let V = (J, {Vj)jej,d,H) be a hypergraph system, and let e £ H. 
Let f : Ve ^ a be a function, and for each e' C. e let Eg' be a subset of Ve' ■ Then 
we have 

\E{f{Xe)]llEAXe')\Xe&Ve)\<\\f\\ne, 
e'Ce 

where Xg' is the restriction of Xg to Ve' (thus if Xe = {xj)j^e then Xg' = {xj)j^e' )■ 

In particular, we have the following result, which is one of four ingredients necessary 
to prove Theorem 2.16. It asserts that Gowers uniform functions are uniformly 
distributed across lower order sets - sets which arise from the cr-algebras Ae' with 
e' strictly smaller than e. 

Corollary 3.6 (Gowers uniform functions arc orthogonal to lower order sets). 
Let V = {J,{Vj)j^j,d,H) be a hypergraph system, and let {ve)ezH be a system 
of pseudorandom measures. Suppose there exists sub-algebras Bg' C Ae' whenever 
e' C J and \e'\ < d obeying the complexity estimate 

\Be> \ < M whenever e C J and \e'\ < d 

for some M. For each e e H, let E'^ be a set in Ve'Ce'^e'- Then we have 

E{lE'^{x)f{7re{x))\x e Vj) = OM(||/||ne) 

for any f -.Ve^R. 

Proof We can decompose E'^ as the union of Om(1) atoms of Ve'Ce'^e'' ^^^^ 
which are in turn the intersection of atoms from Be' ■ By the triangle inequality, it 
thus suffices to show that 

|E(/(7re(x))ni^e'(^)l^eK/)|<||/||De 

e'Ce 



CONSTELLATIONS IN THE GAUSSIAN PRIMES 



15 



whenever Fe' £ Be'- But this foUows from Corollary 3.5 after eliminating the 
redundant averaging over those variables xj for which j e J\e. ■ 

The second ingredient we need to prove Theorem 2.16 is the following minor general- 
ization of Theorem 1.7, which does not involve a pseudorandom system of measures, 
but replaces the sets by bounded weight functions. 

Theorem 3.7 (Weighted hypergraph removal lemma). Let V — (J, {Vj)j^j,d, H) 
be a hypergraph system. For each e £ H, let /e : — » [0, 1] be a bounded non- 
negative function 

nY[feMx))\xGVj)\<d (13) 

for some < ^ < 1 . Then for each e e H there exists a set E'^ e Ae such that 

n^e = (14) 

and 

Y.{f,{-K,{x))lvj\E'Sx)\x e Vj) = 05^o(l) for all e e H. (15) 

Furthermore, there exists sub- algebras Be' Q Ae' whenever e' C J and \e'\ < d 
obeying the complexity estimate 

\Be' \ = Os{l) whenever e' C J and \e'\ < d 

and 

E'^e y Be' for all eeH. 

e'Ce 

Note that Theorem 1.7 is the special case of Theorem 3.7 in the case when the fe 
are indicator functions. 

Proof For each e G H, let Eg C Vj be the set 

Ee:= {xeVj:fe{Tre{x))>6^}. 
Clearly, Ee G Ae- From (13) we see that 

nY[iEAx)\xeVj)\<d'/^- 

eeH 

Applying Theorem 1.7, we obtain a set E'^ G Ae for each e £ H obeying (14) and 
the desired complexity bounds, and such that 

E{1eAx)1vj\e'Sx)\x e Vj) = 05^o(l) for all e£H. 

Using the pointwise estimate feineix)) < Ie^{x) + , we obtain (15), and the 
claim follows. ■ 

The third ingredient of the proof of Theorem 2.16 is the following generalized von 
Neumann theorem, which we prove in Section 4. It asserts that Cowers uniform 

functions have a negligible impact on averages such as those appearing in (9) , even 
when such functions are bounded by a pseudorandom system of measures rather 
than by 1. 
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Theorem 3.8 (Generalized von Neumann theorem). Let V = (J, {Vj)j^j,d, H) be 
a hypergraph system, and let {ve)eeH be a system of pseudorandom measures on V . 
For every e G H, let fe ■ Vg ^ Ti be a function such that we have the pointwise 
estimates 

|/e(a^e)| < i^e{xe) foT all Xg & Vg and e G H. (16) 

Then we have 

E(n feMx))\x e Vj) = 0(inf ll/ellue) + O^^oo(l). 

This theorem will follow from multiple applications of the Cauchy-Schwarz inequal- 
ity; the main difficulty is that of setting up a notational system which is not too 
cumbersome in order to track all the variables. It is the analogue of [9, Proposition 
5.3]. 

The final ingredient in the proof of Theorem 2.16 is the following structure theo- 
rem, which is the analogue of [9, Proposition 8.1]. It splits an arbitrary system of 
functions (bounded by a pseudorandom system) into a bounded component, plus a 
Gowers uniform component, outside of a set of negligible measure. 

Theorem 3.9 (Structure theorem). Let V = {J,{Vj)j^j,d,H) be a hypergraph 
system, and let (fe)eeJ? be a system of pseudorandom measures on V. Let e G H, 
and let /e : Vg — > R"*" be a non-negative function such that we have the pointwise 
estimate 

< fe{Xe) < MXe) foT all Xe G V^. (17) 

Let < e ^ 1 be a small parameter, and assume N sufficiently large depending on 
e. Then there exists a a-algebra Be on Vg and an exceptional set G Bg obeying 
the smallness condition 

E{lQ^{Xe)iye{Xe)\Xe G K) (1) (18) 

and such that Vg is uniformly distributed outside of Clg : 

^{ug\Bg){xe) = 1 + ojv^oo;£(l) for all x G Vg\ng. (19) 
Furthermore, we have the uniformity estimate 

||(l-loJ(/e-E(/e|Be))||oe<eV2l^l. (20) 

The proof of this theorem is somewhat lengthy and will occupy Sections 5-7. As- 
suming both Theorem 3.8 and Theorem 3.9, we can now combine all the above 
ingredients to prove Theorem 2.16 (and hence Theorem 2.17. 

Proof [of Theorem 2.16 assuming Theorems 3.8, 3.9] Let V , {ve)eeH, {Ee)e<£H be 
as in Theorem 2.16. Since Eg G Ag, we can write Eg = ^^^{Fg) for some set 
Fg C Vg. Let < £ < be a small parameter (depending on S, of course) to be 
chosen later. We may assume that TV is large depending on e and 5 as the claim is 
trivial otherwise. Applying Theorem 3.9 once for each e G H with fg := Ip^i^e, we 
can find cr-algebras Bg on Vg and sets Slg G Bg obeying (18), (19), (20). 
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Now write fe,a := (1 - loJ(/e - Wc\Be)) and /e,n^ (1 - Iji jE(/e|Se), thus 
/e,n and /e,ai are real-valued functions on Ve which add up to (1 — ln^)fe, which 
is of course bounded by /e = Ip'et'e- From (19), (20) we have the estimates 

ll/e,o||oe<ei/^''' 

< fe,n^{Xe) < 1 + OAr^oo:e(l) for all Xg G Ve 

|/e,n(a;e)| < VeiXe) + 1 + OAr-»oo:e ( 1 ) for aU Xg G 
< fe,n{Tre{x)) + /e,nJ_(7re(x)) < ls^(x)i^e(7re(x)) for all x e T//. 

Thus we have split (modulo a negligible error) into a bounded component /e 
and a component /e,n with small norm. Prom the latter estimate and (7) we 
have 

nllifeM^eix)) + fe,n^{ne{xmx G Vj) < 6. 
eeH 

We split the left-hand side into 2l^l = 0(1) terms in the obvious manner. All 

but one of these terms involves at least one function fe.n- Applying Theorem 3.8 
(using Lemma 2.15 to replace i^g by ^ + ^i^g, and scaling by the harmless factor 
2 + ojv^(x);e(l)) and the above estimates, we see that the contribution of each 
such term is 0(e^/^'^ ) (if A'' is sufficiently large depending on e). By the triangle 
inequality, we thus conclude 

E( n /e.o- Mx))\x eVj)<6 + 0(ei/2'-" ) = o{S) 

since we are taking < e < i5^' ^' . We can now apply Theorem 3.7 (with /e replaced 
by /e,n-L) to obtain sets E'^ G Ae for each e G H obeying (14) and 

E(/e,n^(7re(a;))ly,\£,(a;)|a; G Vj) = 05^0(1) for all e G H. (21) 

Furthermore, there exists sub-algebras Bgi C Ag' whenever e' C J and \e'\ < d 
obeying the complexity estimate 

\Be' \ = Os{l) whenever e' C J and \e'\ < d 

and 

E'^G \J Be' for all e G H. 

e'Ce 

The only remaining thing to establish is (8). Applying Corollary 3.6 we obtain 

We,n{Mx))lvj\E'Sx)\x G Vj) = O^dl /e,n || ne) = Os{s) for all e G H. 
Adding this to (21) we conclude 

X gVj) = 05^0(1) for alleG H 



E (^(1 - lnATre{x)))fei7Te{x))lvAE'Sx) 

if e is sufficiently small depending on 6 (and N is sufficiently large depending on 
e). Thus we have 

X gVj] = 05^0(1) for all e G -ff . 



E 1^(1 - lQ^{'Keix))Ue{'Keix))lE,\E'Sx) 

From this and (18) we have (8) as desired. 
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It now remains to prove Theorem 3.8 and Theorem 3.9, which we shah do in the 
next few sections. The proofs of these theorems can be read independently of each 
other. 



4. A GENERALIZED VON NEUMANN THEOREM 

The purpose of this section is to prove Theorem 3.8. We shall follow the proof of 

[9, Proposition 5.3] closely. The basic idea is to repeatedly use the Cauchy-Schwarz 
inequality to replace each of the /e factors by a v^. in turn, until only one function 
/e remains. The key estimate for doing so is the following: 

Proposition 4.1 (Cauchy-Schwarz). Let V = {J,{Vj)jej,d,H) be a hypergraph 
system, and let {ve)eeH be a system of pseudorandom measures on V . For every 
e G H, let fgiVe^Rbea function such that we have the pointwise estimates 
(16). For any set J' C J, let Qji denote the quantity 



Qj':=n n n /-(^ 



eei?: J'Cea;e{0,l}-'' 



n n -e(4-^)i4^4^^ey.;4°v=x«, 

eeH:J'get^e{0,l}enjr' 



where we extend lo arbitrarily from J' or ef] J' to e ( the exact choice of extension 
is unimportant since x'^jyj, = x'^J^j,). Then for any J' C J and jo € J\J' , we have 

\Qj'\ < (1+Ojv^oo(l))|gj'u0o}l'^'- 

Example 4.2. If J = {1,2,3} and H = (^), then 

Q(ti = E(/{i^2}(a;i,a;2)/{2,3}(a;2,a;3)./{3,i}(^3,a;i)|a;i € Vi,X2 G V2,xz € V^) 
Q{i} = E(/{i_2}(a::i,a;2)/{i,2}(a;'i,a;2)i^{2,3}(a;2,a;3)/{3,i}(a;3,a;i)/{3,i}(a;3,a;i) 
\xi,x\ e Vi,a;2 e V2,xs e Fs) 

Q{1,2} = E(/{1,2} {Xl , a;2)/{l,2} {x'-i , X2)f{l,2} [xi , x'2)f{l,2} ix[ , 4) 
!>'{2,3}(a;2, X3)l'^2,3}{x'2, a;3)^'{3,l} (2:3, Xi)u^3^iy{x3, x[) 

\xi,x'i e Vi,x2,x2 e V2,X3 e V3). 
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Proof For all pairs {x''j\x'-p) eVjxVj with x^j^j, — x^j^^j,, let us define the 
functions 

Fixf\x^P):= n n f^(-'^') 

eGff:./'Ce;ioee^.g{0.1}-'' 

G{xf\x^P):= n n 

eGff:J'Ce;jo^ea;e{o,l}^' 

K{x^p,4'')--= n n 

eeH:J'ge;joee ue{0,l}''"-'' 

L{xf,x^P):= n n ^e(4'^') 

M(4°),xW):= n n -e(4'^^), 

e6//:jo^ea;e{0,l}=nJ' 

then we can write 

Write J* := J\{jo}- Currently, wc arc averaging over a pair {x'"j\x'"p) in Vj x Vj 
with = x^j,. But this is equivalent to averaging over a pair in 

Vj* X Vj* with iCj'i^yj, = x^}}\jn together with an element Xj„ G Vj„^ with the 

understanding that x^^^ = x^^^ = Xj„. If one performs this change of variables, 
then the functions G and L become independent of Xj^. Thus we can write Qji 
(with a slight abuse of notation) as 

^{^{FK{xf,,xf,,Xi,)\x^, e V^,)GL{xf,,xP,)\xf,,xP, € Vj.;xf,^j, = x%,). 

By the hypothesis (16), we have \G{q*)\L{q*) < M{q*). Applying Cauchy-Schwarz, 
we thus have 

\Qj,\<X^I^Y^'^ 

where 

X :=E(|E(Fi^(4°),4V,x,J|a;,„ e V^,)\''M{xf, ,xf,)\xf, ,xf, e Vj,;x%, = xf,^j,) 
and 

y :=E(M(a:^,.W)|.^,xW =x« „)• 



Prom the linear forms condition (5) we have 

y = i + o]v^<x>(i)- 

On the other hand, we can expand X as 

E (FK(J§} , x« , 4:;) )Fi^ (x^ , x« , J) )M(x^ , x« ) 

\Xj,,Xj, t VJ*,Xj,y_^, — Xj,^_^,, X^-^ ,Xj-^ t Vjf^). 

Re-inserting the definitions of F,K,M and comparing this against Qj'u{jo}j 
conclude that X = Qj'u{jo}- ■ 



Now we prove Theorem 3.8. 
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Proof [of Theorem 3.8] Pick any cq e H. It suffices to show that 
Applying Proposition 4.1 repeatedly, wc see that 

IQ0I < {l + ON^ooil))\Qeo\'^^'- 

On the other hand, direct computation shows that 

O0 =E([] /e(7re(a;))|a;G Fj). 

Thus it suffices to show that 

Qeo = O(||/eo||nto) + OiV^cx>(l). 

We may expand 

Qeo=E( n /eo(4o') n n ^^(^i"^) 

Lo£{Q,l}'0 eGff\{eo} uje{0,l}^:ujj=0 for all jee\eo 

we{o,i}=o 

where W{x'e^ ,x)^g) is the cube counting function 

eeH\{eo} a'e{0,l}«"^=o 

On the other hand, by definition of the norm we have 

E( n /eo(4:^)ko\4:^e V;j = ||/ej|2oto. 
a;e{0,l}«'o 

Thus by the triangle inequality, it will suffice to show that 

E((W(4°),xW) - 1) n /eo(4o')l4?,^iJ^ e Veo) = OjV^cx>(l). 

we{o,i}'=o 

Applying (16) and Cauchy-Schwarz, it suffices to show that 

E(|PF(4°),a;W) - Ij" J] ^eo(4:^)l4^ G Ko) = Oiv^oo(l) 
we{o,i}«o 

for n = 0, 2. Expanding this out, it suffices to show that 

E(W^(40),ar«)" n '^eo(4o^)l4?,4'o^ e Feo) = l + 0;v^^(l) 

wG{0,l}=o 

for n = 0, 1, 2. But the left-hand side can be rewritten as 

n 

E([ n n n '^^c^'^^)] n ^eMz')\^f^4^ ^vj) 

eeH\{eo}i=lcjg{0,l} = :Wj=i for all iGe\eo we{0,l}=o 

and the claim thus follows from (5). ■ 
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5. Dual functions and a uniform distribution property 



We now turn to the proof of Theorem 3.9. As with [9], a key tool will be the notion 
of dual function introduced in Definition 2.5. By definition of 2?e and of the 
norm we observe the identity 

E(/I?e/) = WfWt (22) 

for all f : Ve ^ R. Thus if / is not Gowers uniform in the sense that ||/||d= is 
large, then / will have a large correlation with its dual function. 

The next important observation, which is a direct consequence of the dual function 
condition (Definition 2.7) is that if / is bounded pointwise by + 1, then the dual 
function is uniformly bounded: 

Pe/(a;i°)) = 0(1) for all e (23) 



We now come to a deeper property of dual functions, namely that a pseudorandom 
measure Ve is uniformly distributed with respect to arbitrary polynomial combina- 
tions of these functions. 

Proposition 5.1 (Uniform distribution property). Let V = {J,{Vj)j^j,d,H) be 
a hypergraph system, let (fc'e)eeff be a system of pseudorandom measures, and let 
e G H. Let K be a finite set, and for each k G K let fk ■ Vg ^ R be a function 
such that 



|/fe(a;e)| < i^e{xe) + 1 for all Xe e Ve- 



Then we have 



E {,^e{Xe) - 1) n '^efkiXe 
\ keK 



\Xe e Ve 



= On- 



=;k(1). 



(24) 



(25) 



As in [9, Lemma 6.3], the key feature here is that K is allowed to be arbitrarily 
large. 

Proof We may use the trick of using Lemma 2.15 (conceding a factor of 2l^l) to 
replace the hypothesis (24) by the stronger hypothesis 

|/fe(a;e)| < Mxe) for all XeGVe. (26) 

Let us write g := Ve — 1. By relabeling we may assume that Q,l ^ K. For any 
e' C e, we introduce the quantity Qe', defined as 

Qe' :=e( n 5(^e"^) 

we{o,i}«:w3=o for all jee\e' 

n n ^(^i"^^) 

fee^f wG({o,fc}=\'='\0'=\=')x{oa}=' 

€ Ve;x^^^ G e K\e' for all Ugk]. 
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Example 5.2. If e = {1,2}, then 

Q$ = ^{g{xi,X2) Y[ fk{xi,x^^^)fk{xf\x2)fk{xf\x:^^) 

keK 

\xi,x[''\€ Vi,X2,x^2^ e V2 for k€K) 
Q{i} ='E,{g{xi,X2)g{x[,X2) JJ fk{xi,xf^)fk{x'-^,x':P) 

\xi,x[ e ^1,2:2,4''^ G V2 for keK) 

<9{1,2} = E{g{Xl, X2)g{x[, X2)g{xi, X2)g{x[, X2)\Xi, X[ G Vi,X2,X2 G V2) 

We claim the following analogue of Proposition 4.1. 

Proposition 5.3 (Cauchy-Schwarz). Let the notation and assumptions be as above. 
Then for any e' C e and j G e\e', we have 

\Qe'\ < OK{\Qe'U{j} 

If we assume this proposition, then by iterating it we obtain 

|E((z/e(are) - 1) n ^efk{Xe)\Xe G K)| = IQ0I 
keK 

= OK{\Qe\'/'') 

= 0^(|E( n 5(4"))|x(°)GK;xWGK)r/'') 

= 0k{\ E (-l)^EK(4'^^)kWGK;a.«GFe)|V2') 

AC{0,1} = 

= OKi\ J2 (-1)^(1 + OAr^cx>(l))|'/'') 

AC{0,l}<i 
= OjV^oo;_ff (1) 

as desired, where we have used (5) and the binomial formula X)^c{o ^ 
(1 — l)l{0'i}°l = 0. Thus it remains to prove the proposition. To control Qe', we 
organize the variables x^\ x'"J,\ aj^y^, into three groups x,y, z, where 

y:=xf^eY:=Vj 
z:={xf)keKeZ:=V^^. 
We can then factorizc 

Qe' = E(E(^^(f , y)\ye r)E(G(f , z) |£ G Z) |f G X) 

where 

F{x,y):= Yl g{xi-y)]l [] /'^(^e"^) 

we{0}«\«'x{0,l}«' 'se-K'a;e({0,fe}«\(e'i-'O})\0e\(«'i-'O}))x{0}O}x{0,l}e' 
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and 

fee-ff c^e{o,fe}=\('='u{j}) x{fe}0'} x{o,i}e' 
Applying Cauchy-Schwarz, wc then have 

IQe'l < E(|E(F(x,y)|y e Y)\'\x G Xy/^E{\E{G{x, z}\z e Z)\^\x G X)^/\ 

By using the definition of F, we have 

EmF{x,y)\y€ Y)\^\x € X) = Q^'u^}- 

Thus it will suffice to show that 

E(|E(G(x,z)|z e Z)f\xeX)=OK{l)- (27) 

We expand the left-hand side and use (26) to estimate this by 

E( ]J Yl e X, xf\xf^ e Vj for aU k K) 

feeif a>e{0,fc}=\(e'|-'O})x{fe,fe'}O}x{0,i}e' 

where fc fc' is some arbitrary bijection from the label set ii' to a disjoint label 
set K' of equal cardinality. Expanding out x, we can factorizc this expression as 

where 

L{x%yx^^) := E( n ^Me^)\^^^(e'^m e V,\(,,^{,}y,xf\xf^ e Vj) 

w6{0,fe} = \(='u{j})x{/c,/c'}O} x{0,l}=' 

for some arbitrary label k G K (the exact value of k is irrelevant). But after 
relabeling, we have 

= E( n ^e(a:i'*'')k«,,^^,^ e ye\(e'UO});4°^4'^ ^ F,) 

^£{0,1}"= 

where 

^(^i\W'^i\0})^=E( n Mxi-^)\xf\xf^ GV,). 

a;e{0,l}« 

By Minkowski's inequality (i.e. the triangle inequality in i^), we have 

E(i.(.(°) .j,.«)-|.i°) e V^^,,y,x^} e V,fl- < E(M(.f^) .j)-|.(°) G K.^jy)^/-, 

and hence by the correlation condition (6) we obtain (27) as required. ■ 

An immediate corollary of Proposition 5.1 and the triangle inequality is 

Corollary 5.4 (Uniform distribution property with respect to polynomials). Let 

V = {J,{Vj)j^j,d,H) be a hypergraph system, let {ve)eeH be a system of pseudo- 
random measures, and let e € H . Let K be a finite set, let D > he an integer, and 
let P : R be a polynomial of degree D in K variables, with all coefficients 

bounded by some quantity M. For each k G K let fk Vg ^ R be a function such 
that 



\fk{x)\ < feix) + 1 for all x G Vg. 



(28) 
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Then we have 

^oo;X,_D,m(1)- (29) 

(Recall we allow our constants to depend implicitly on J). 

Remark 5.5. Following [9], one could also extend this corollary from polynomials to 
continuous functions using the Weierstrass approximation theorem (and the bound 
(23)), but we will not need to do so here. 

6. C7-ALGEBRAS OF DUAL FUNCTIONS 

We continue the proof of Theorem 3.9. As in [9, Theorem 8.1], we will exploit the 
above uniform distribution property to associate a cr-algebra to every dual function. 
We first give a minor variant of [9, Proposition 7.2]: 

Proposition 6.1 (Each bounded function generates a a-algebra). Let V = (J, (V,)jg j, d, 
be a hypergraph system, let {ve)e£H be a system of pseudorandom measures, and 
let e £ H. Let < e < 1 and < a < 1/2 be parameters, let I be an interval in 
R, and let G -.Ve ^ I be a function. Then, if the pseudorandomness parameter N 
is sufficiently large depending on s,a, there exists a a-algebra Be,a-,e{G) on Ve with 
the following properties: 

• (G lies in its own a-algebra) For any a-algebra B on Ve, we have 

\G{x) - ^{G\Be,a,e{G) V B){x)\ < E for all x e (30) 

• (Bounded complexity) Bs^cr,e{G) is generated by at most Ogj{l) atoms. 

• (Approximation by polynomials of G) If A is any atom in Bs^„^e{G), then 
there exists a polynomial PA,s,a,i of degree Oca.ii^) md all co-efficients 
0£,cr./(l), such that PA.e,<T.i{x) = 0(1) for all x ^ I and 

E(|lA(a;) - PA,e,aAG{x))\{ue{x) + l)\x G Ve) = 0{a). (31) 
Proof Observe from Fubini's theorem that 

/ '^{'^G{x)(^[e{n-o+a),e{n+a+a)]{Ve{x) + l) \x & Ve) da = 2a^{Ve{x) + l\x G Ve). 

Jo rj 

Since is pseudorandom, we have 

Y.{ye{x) + \\x e Ve) = 0(1) (32) 

if TV is large enough. Thus by the pigeonhole principle we can find < a < 1 such 
that 

G(a;)e[£(n-<T+a),e(n+CT+a)]('^e(2;) + 1) | a: G Ve) = 0{a). (33) 

We now set Be,a,e{G) to be the cr-algebra whose atoms are the sets G~^{[e{n + 
a), e{n + 1 + a))) for n G Z + a (discarding all the empty atoms, of course). This 
is well-defined since the intervals [e{n + a), e(n + 1 + a)) tile the real line. Since G 
takes values in I we see that there are only 0^,1 non-empty atoms. 
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It is clear that if B is an arbitrary cr-algebra on Ve, then on any atom of BV Be{G), 
the function G takes values in an interval of diameter e, which yields (30). Now we 
verify the approximation by continuous functions property. Let A := G~^{[e{n + 
a),e{n + 1 + a))) be an atom. Since G takes values in J, we may assume that 
n = 07_e(l), since A is empty otherwise; note that this already establishes the 
bounded complexity property. By combining Urysohn's lemma with the Weierstrass 
approximation theorem, we can find a polynomial PA,e,cr,i which is equals l + 0{a) 
on [£{n + a + a) , e{n + a + 1 — a)], equals 0{a) on /\[e(n + a — cr), e{n + a+l + a)], 
and equals 0(1) on all of /. Furthermore, a simple compactness argument shows 
that the degree of PA,e,a,i can be chosen to be Os,a,i{^), and all the coefficients can 
also be chosen to be Os^cr,i{^)- We have the pointwise estimate 

\1a{x) - PA,e,ajiG{x))\ = 0{a) + ^ (l[£(m+c«-o-),e(m+c«+a)] (G(x))) 

Tn=n 

so by applying (33) and (32) we obtain (31). ■ 



We specialize this Proposition to functions G which are dual functions, to conclude 

the following analogue of [9, Proposition 7.3]. 

Proposition 6.2. Let V = {J,{Vj)j^j,d,H) be a hypergraph system, let (fe)eeff 
he a system of pseudorandom measures, and let e G H. Let K be an integer, and 

for each 1 < k < K let fk '■ Ve ^ ^ be a function such that (28) holds. Let 
< £ < 1 and < a < 1/2 be parameters, and let Be,a-,e{T^efk) for 1 < k < K be 
constructed as in Proposition 6.1 (note from (23) that we can take I to be a fixed 
interval of width 0{\))). Let B := \/ Be,a,e(T^efk)- Then if a is sufficiently 
small depending on K, e, and N is sufficiently large depending on K , e, a, J, d, 
we have 

\VJk{x) - E{VJk\B){x)\ <e for alll<k<K,xe V^. (34) 
Furthermore there exists a set fl G B obeying the smallness condition 

mMx) + l)lUx)\x G Fe) = OkA'^'/^) (35) 

and such that 

E(i^e - l\B){x) = Ox,e(a^/2) for all x e Ve\0. (36) 



Proof The claim (34) follows immediately from (30). Now we prove (35) and 
(36). Since each of the Be,a.e{T>efk) are generated by Oei^) atoms, we see that B is 
generated by OK,e{^) atoms. Call an atom A of B small if E{{i^e{x) + 1)1a{x)\x S 
Ve) < u^/^, and let O be the union of all the small atoms. Then clearly fl lies in B 
and obeys (35). To prove the remaining claim (36), it suffices to show that 

E{{Ueix)-l)lA(x)\xeVe) _ 

= Oiv^oo;7f,e,<7(l) + OK,e{o^ ' ) (37) 

^(\a(x)\x e K) 

for all atoms A va. B which are not small. However, by definition of "small" we 
have 



E((!/e(a;)-l)lA(a;)|a; e ye)+2E(U(a;)|a; e K) = ^{Xve{x)^\)\A{x)\x € Ve) > a^^^. 
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Thus to complete the proof of (37) it will suffice (since a is small and N is large) 
to show that 



E((z/e(a;) - \)\a{x)\x e = ON_^.K,eA^) + OkM)- 



(38) 



On the other hand, since A is the intersection of atoms A}^ e Be^a^e{T^efk) for each 
1 < k < K, wc sec from Proposition 6.1 (and Holder's inequality) that wc can find 
a polynomial P : R ^ R of degree Oe,<7,K(l) and coefficients Oe,CT,if (1) such that 



E [^{Ueix) + 1)\1a{x) - P{VJl{x), . . . ,Vefk{x))\ 

so in particular 



XGVe 



Ok (a) 



E \^{u,{x) - 1){1a{x) - P{V,h{x), . . . ,VJk{x)) 
On the other hand, Corollary 5.4 we have 



E {iy,{x) - l)P(V,Mx), . . . ,VJk{x)) 



The claim (38) now follows from the triangle inequality. 



7. A FURSTENBERG TOWER, AND THE PROOF OF THEOREM 3.9 

We are now ready to prove Theorem 3.9. As in [9], this theorem shall be proven by 
a constructing a Furstenberg tower of increasingly complex cr-algebras. 

Fix V, e, Ve, fe, £• We shall need a parameter < cr <C £ which we shall choose 
later, and then we shall assume N is sufficiently large depending on a and e. 

To construct Be and Q,e we shall iteratively construct a sequence of basic Gowers 
anti-imiform functions Pg-Fe,!, • • • ,'C>eFe,K on Vg, exceptional sets r2e,o ^ C 
. . . C Qe,K C Ve, and a nested sequence of cr-algebras Be,o C . . . C Be,K for some 
integer K > as follows. 

• Step 0. Initialize K = 0, and define Be,Q ■= {0. V,,} and fle.o := 0. 

• Step 1. Set Fe,if+i := (1 - loe.x)(/e -'E{fe\Be,K)). If we have 

IIP II <- =■1/2"+' 

then wc set fie '■= fle,K and Bg = Be,K, and successfully terminate the 
algorithm. 

• Step 2. If instead we have 

\\Fe^K+i\\n^ > e'^^"^", (39) 



then we let Be,K+i ■= Be,K V Be,a,e{'^eFe,K+i), where Be,aA'^eFe,K+i) is 
as in Proposition 6.1. 
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• Step 3. Locate an exceptional set fle^K+i 3 ^e,K in Be,K+i obeying the 
smallness condition 

m^x,) + l)la.,^+i(a;e)ke e = OkA^^'/^) (40) 
and such that we have the bound 

E(l/e|Be,K+l)(a;e) = 1 + for all Xe e Ve\ne,K+l. (41) 

If such an exceptional set cannot be fonnd, we terminate the algorithm with 
an error; otherwise, we move on to Step 4. 

• Step 4. Increment to + 1, and return to Step 1. 

Let Kq be a large multiple of 1/e to be chosen later. We claim that this algorithm 
necessarily terminates without error in Step 1 in less than Kq steps (so K always 
remains smaller than Kq), if A'' is sufficiently large depending on s and a. Assuming 
this for the moment, then by construction we have (20), as well as the bounds 

and 

^{Ue\Be){Xe) " 1 = Oe{a^/^) for all Xe G K\f^e, 

where we use the hypothesis that K = 0{Kq) = 0^(1). If we choose a sufficiently 
small depending on e, and then assume N sufficiently large depending on e and 
cr, we thus see that the right-hand sides of these bounds can be made as small as 
desired, thus obtaining (18) and (19). 

It remains to show that the algorithm does indeed terminate without error in less 
than Kq steps. We first show that it will not terminate with error in the first Kq 
steps. To see this, observe that we only have to show that Step 3 can be executed 
without error whenever K < Kq. But observe from (17) and (41) for step K —1 (if 
K >1) ov from the bound (3) (if K = Q) that we have the pointwise bound 

\E{U\Be^K){xe)\ < 1 + OkA^^'^) + ow-oo(l) for all x^ ^ n^^K (42) 
and hence by (17) again 

\Fe,K+l{Xe)\ < Ve{Xe) + 1 + OkA'^^''^) + OAr^oo(l) for all Xe G V^. 

(43) 

Applying (a slightly rescaled) version of (23), we conclude 

\VeFe,K+i{xe)\ < 0(1) + OK,e{<^^'^) + Ojv^oo(l) for all Xe S Fe- (44) 

The claim now follows by letting Vi G Bf.^K+i be the set defined in Proposition 
6.2, using the family of functions F^^i, . . . , i^e,x+i instead of /i, . . . , fx and then 
setting rie,if+i '■— ^e.K u ri. 

The only other remaining possibility to eliminate is that the first Kq loops of the 
algorithm are executed without error or termination. We shall show this cannot 
happen by establishing the energy incrementation inequality 

E((l - neAXe)Mfe\Be,j){Xef\Xe & V^) 

o o (45) 

> E((l - fleJ-l{Xe)Mfe\Be,j-l){Xef\Xe € K) + c'e 
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for all I < j < Kq, and some c > independent of e or a. On the other hand, 
the quantity E((l — Qe,j{xe))^{fe\t3e,j){xe)'^\xe € Ve) is clearly bounded below by 
zero, and bounded above by 

E((l - ne.j{Xe))Eil^e\Be,j){Xef\Xe € Ve) < 1 + Oj.eia^^^) 

thanks to (41). The two facts are contradictory by choosing Kq to be a large 
multiple of if cr is chosen sufficiently small. 

It remains to prove (45). Since the algorithm successfuly executed the first Kq 
loops, we have 

\\F,j\\ne > s^''^'. 
Raising this to the power 2'^~^, and using (22), we conclude 

where we are using the usual inner product 

{f,g) ■.^E{f{xe)9{xe)\xeeVe). 
On the other hand, from (43), (44), (40) wc have 

(since j = 0{Ko) = Oe(l)) and hence by the triangle inequality 

((1 - lo..,)Fe,„Pei^e,,) > e'^' " Oe{a'/^) 

On the other hand, from (34) we have 

VeFejiXe) = E{VeF^j\Be,j){Xe) + 0{s) for all Xe € Ve 

and hence by (43) and (3) 

((1 - ln^JFej,VeFej - E{VeFej\Be,j)) = 0{e). 
We conclude that 

((1 - In^JFejM'^eFejlBej)) > s'/^ - 0,{a'/^) 
Since 1q^ ^. is measurable in Bgj, we obtain 

((1 - la^JE{Fej\Be,j)M1^eFe,j\Be,j)) > s'/^ - 0,{a'/^). 
By (23) and Cauchy-Schwarz we conclude 

11(1 - ln..,)E(Fe,,|^e,,)IU== > C£V2 _ o,(aV2) 
for some c > independent of e, ct, and where 

\\f\\L^ = {.f,.fy/'^-E{fi^e)\Xe&Vey/'. 

Using the definition of Fej, we conclude 

11(1 - 1,,„ J(E(/e|Se,,-l) - E(/e|ee,,))||L^ > cs'^' " 0,{a'/'). 

We now use the cosine rule to conclude 

11(1 - lo.,,)E(/e|Se,,)||i2 > 11(1 - lo.,,)E(/,|B,,,_i)||i. + c'e ~ O,{o^l^) 

+ 2((1- lo.,,)[E(/e|Se,,) -E(/e,Se,,-l)],(l " In. jE(/e|6e,,-l)) • 

The inner product here can be rewritten as 

<E(/e|Bej) - E(/e|Be,,--l), (1 - Iq, JE(/eiBe,i-l)) • 
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Now observe that the quantity in square brackets has zero conditional expectation 
with respect to Bej-i, and in particular is orthogonal to (1 — ._^)E{fe\Be,j-i)- 
Thus the above inner product can be rewritten as 

Observe that the second factor is measurable in Bj, and so the inner product can 
be rewritten again as 

(/e - E(/e|Be,,_l), (lOe.-i ' ln^,,Mfe\Be,j-l)) ■ 

Using (41) we have 'E{fe\Be,j-i) = 0{1) outside of il^.j^i, and so from this and (40), 
(17) we see that this inner product is 0j,e{<7^^^) — Oe{(T^^^), since j = 0{Ko) = 
Os{l). Summarizing all the above computations, we conclude that 

11(1 - la^jE(/e|ee,,)||i. > 11(1 - lo.jE(/e|Se,,-l)||i. + c's - 0,{a''^). 

On the other hand, from (42), (40) we have 

11(1 - ln.,)E(/e|Be,,_l)||i. = 11(1 - lo.„_ JE(/e|Be,,_l)||i. + 0,{<j'/^) 

and we thus conclude (45), if a is sufficiently small depending on e. This concludes 
the proof of Theorem 3.9. ■ 



8. Constellations in the Gaussian primes: preliminaries 

We now begin the proof of Theorem 1.2. In this section we shall reduce matters 
to the point where we can apply Theorem 2.17, at which point the only remaining 

task will be to establish that a certain family of measures {i'e)eeH constructed here 
is pseudorandom. This will then be achieved in the next section. 

By making the substitution a a — rvQ if necessary wc may take vq = 0. By adding 
some dummy elements to the vj if necessary, we may assume the ergodic hypothesis 
that the vj (and hence their differences Vi — Vj) generate Z[i] as an additive group. 
Such a maneuvre is terrible for the quantitative bounds, but for the question of 
merely establishing infinitely many prime constellations, it is harmless. In fact by 
adding a few more dummy elements we can impose the following slightly stronger 
hypothesis: 

Hypothesis 8.1 (Improved ergodic hypothesis). Ifi,j are two distinct elements 
of J, then the vectors {vk — Vj : k ^ span Z[z] as an additive group. 

This hypothesis is not strictly necessary for our argument, but it does simplify 
matters slightly. 

Henceforth we allow all implicit constants in the 0() and o() notation to depend on 

k and wq, . . . , ffc-i- Wc will also use C, c > to denote various positive constants 
(possibly depending on the above parameters) which can vary from line to line. 

To avoid confusion let us use the terminology rational prime to denote a prime in 
the natural numbers , and Gaussian prime to denote a prime in Z[i]. Thus for 
instance 5 = (2+i)(2 — i) is a rational prime but not a Gaussian prime. Similarly we 
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use raiionaHnieger to denote an element of Z. Wc let Z[z]^ := 1,— i} denote 

the Gaussian units, that is the invertible elements in Z [i] . Let us call two Gaussian 
non-zero integers associate if their quotient is a Gaussian unit, and non-associate 
otherwise. 

Given any non-zero Gaussian integer z, we define its norm. 'N{z) to be the quantity 
N(z) := |Z[i]/0Z[i]|; it is easy to verify that N{zz') = N{z)'N{z') and N(a + bi) = 
c? + As is well known (see e.g. [10]), N(P[i]) consists of the number 2, as 
well as the rational primes equal to 1 modulo 4, and the squares of the rational 
primes equal to 3 modulo 4. Of these three cases, the second case is by far the most 
prevalent. As the other two cases cause some minor difficulty^, wc shall remove them 
by defining the unexceptional Gaussian primes P[i]' to be those Gaussian primes 
p e Z[i] such that N(p) is a rational prime equal to 1 modulo 4, and define Z[i]g^ to 
be those non-zero square-free Gaussian integers whose prime factorization consists 
only of unexceptional Gaussian primes (and Gaussian units, of course). Note that 
unexceptional Gaussian primes have non-zero real part and non-zero imaginary 
part. Wc define P[i]'+ C P[i]' to be those unexceptional Gaussian primes which 
lie in the first quadrant; thus every unexceptional Gaussian prime is conjugate to 
exactly one prime in -P[j]'^. 

Clearly, in order to obtain infinitely many constellations in P[i] it suffices to obtain 
infinitely many constellations in P[i]'. The first main task is to obtain a number- 
theoretic pseudorandom majorant ly for the unexceptional Gaussian primes, or more 
precisely for a weight function adapted to a variant of the unexceptional Gaussian 
primes in which all the non-uniformity arising from small divisors has been elimi- 
nated. Recall that every rational prime p equal to 1 modulo 4 is the norm of exactly 
eight unexceptional Gaussian primes (two of which lie in P[i]Y). From Dirichlet's 
theorem (in the modulo 4 case) and the prime number theorem we thus have 

\{p e P[i\' : N{p) < N'}\ = (2 + o;v^^(l))^. (46) 

Remark 8.2. This bound is of course consistent with (a very simple case of) the 
Chebotarev density theorem. 



We now adopt a Gaussian integer version of the "W-trick" from [9], whose purpose 
is to eliminate non-uniformities in the Gaussian primes which arise from small di- 
visors. Let be a large rational prime; wc view this as a parameter which will 
eventually be sernt to infinity. Let w = w{N) be a positive rational integer which 
grow very slowly to infinity as A' — > oo, thus we can write any expression of the 
form o^^oo(l) as on^oc{^), and any specified expression of the form o^^co-.wi^) 
as OAr^oo(l); we shall frequently take advantage of these facts in the sequel without 
further comment. We let W = W{N) := npgp[i] N(p)<u) ■'^(P) ^® product of 
the norms of all the Gaussian primes of norm less than w; note that the growth 



Specifically, the exceptional primes cause an unwelcome irregularity, namely that the Gaussian 
primes have an anomalous density on certain lines, such as the real or imaginary axes, which will 
disrupt the pseudorandomness hypothesis we impose later. This is ultimately due to the fact that 
the field Z[i]/pZ[i] does not have rational prime order if p is exceptional. 
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comments about w apply just as well to W , thus for instance any specified expres- 
sion of the form ow^ooi^) or OAr^cc;ir(l) can be written as O7v^oo(l)- We can 
partition Z[i] into cosets W ■ Z[i] + b, where b G [0, VK)^. Let 02[i](^) denote the 
number of Gaussian integers in [0, W)"^ which are coprime to W. We also need a 
small number < e < depending on fc, Ui, . . . , Ufe, to be chosen later. By (46) 
we have 

\{P e P[i]' : < N(p) < {eNWf}\ > c. 



for some > 0, ii N is sufficiently large depending on e (and W is slowly growing 
with respect to N) . We caution that the value of will vary from line to line. By 
the pigeonhole principle^ we can find a 6 G [0, W)'^ coprime to W such that 

for some slightly different > 0, where Ab C Z[i] is the set 

Ab := {n G Z[i\ : ^e^^^ < p^^^) < ^2^2. + 6 G P[i\'}. 

Fix such a b. It would now suffice to the quantitative estimate 
{(a,r) G Z[i]xZ : a + rvj G A^ for < j < fc}| 

> (Ce - Ojv^cx);e(l)) 



Note that the contribution of the degenerate cases r = becomes negligible for N 
large enough. 

Let Z := Z/NZ, and let tt : Z[z] ^ Z"^ be the obvious projection map. If e is 
sufficiently small depending on vi,... ,Vk, and A'' is large enough depending on 
e,vi, . . . ,Vk, it now suffices to show that 




{W)logN 



ae Z^;re Z 



for some > 0. 

Remark 8.3. This bound is consistent with the analogue of the Hardy-Littlewood 
prime tuples conjecture for Gaussian primes; see [13]. Of course, our work here 
does not make any serious progress towards that conjecture. 



Prom (47) we have 



/^Z^HOlogTV 

^ 1 ^^iA,){x) 



X e Z^ \ > Ce > 0. 



(48) 



^One could also use the Gaussian integer analogue of Dirichlet's theorem at this point, but it 
is unnecessary for this argument. 
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The next step is to construct a suitable pseudorandom measure on so that 
we may invoke Theorem 2.17. One could modify the truncated divisor sums of 
Goldston and Yildirim (as used in [9] for the rational primes) directly. However we 
take advantage of a slight simplification to their approach introduced in [24] which 
uses less information on the Gaussian integer ^-function (in particular, using only 
the very crude zero- free region in the vicinity of the pole at s = 1) to obtain a 
qualitatively similar result in a slightly more elementary fashion. 

Define the Mobius function I^Yili] ^ R for the Gaussian integers by setting 

/i2[j]('T') •= (—1)™ when n G Z[i] is the product of m pairwise non-associate Gauss- 
ian primes, and zero otherwise. Similarly, we define the von Mangoldt function 
AZ[i] • ^ ^"-"^ Gaussian integers by setting A2[j](n) := logN(p) if n 
is associate to a power of a Gaussian prime p, and equal to zero otherwise. From 
unique factorization in Z[i], one easily verifies the identities 

logN(n) = J Az[,j(d) 

deZ[i]\{0}:d\n 

AZ[i]H = J E MZ[i](rf)logN(^) 

deZ[i]\{0}:d|n 

for all n e Z[z]\{0}; the factor of ^ is due to the four Gaussian units 1, —1, i, —i. 

We now smoothly truncate the above formula for A2[^](n) to obtain a truncated 
divisor sum of Goldston- Yildirim type, and also restrict to the unexceptional Gauss- 
ian integers Z[i]g^. Let R := N'' for some small c = Cfc > to be chosen later (e.g. 
Cfc = 2"-'^°*"^ would suffice). Let </? : R ^ R"*" be a smooth bump function"^ sup- 
ported on [—1, 1] which equals 1 at (any standard bump function would do here), 
and define 

^Zw..,nJn):^\logNiR) MZw(^)^(g^). (49) 

One observes that ^Z[i] R<fii^) ~ logN(i?) whenever n is an unexceptional Gauss- 
ian prime with N(n) > N(i?); in particular, this is true whenever < N(n) < 
g2jY2_ \y^e now define the function v : ^ R^ by 

u{n) := C^ lU 'Z^,n, when N(.-(n)) < e^TV^ 

y 1 otherwise (50) 

where Cj^ > is a normalization factor depending only on to be chosen later 
(it is the constant which ensures that v has mean close to 1), and 7r~^ : — > 
{—N/2, N/2Y is the inverse of tt taking values in the fundamental domain {—N/2, N/2y 



■^Any standard bump function will do here. The actual truncated divisor sum corresponding 
to Goldston-Yildinm corresponds to the choice (p{x) := max(l — |3;|,0), which oflfers the advantage 
that all integrals involving ip can (in principle) be worked out explicitly, but has only a limited 
amount of regularity which necessitates knowledge of the zero-free region on the axis Re s = 1 in 
order to proceed. 
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By construction we see that v is non-negative and 

"^")=^- N(WO 

whenever n G 7r{Ai,). In particular, from (48) we have 

Our task is now to show that 

E( II {iyl^^A,)){a + rvj)\aeZ^;reZ) 

Q<j<k 

> Ce,<p — OAr^oo;e,¥'(l) 

for some 0^,,^ > 0. Since the Vj were assumed to contain zero and span Z^, they 
will also span Z^ if the prime is sufficiently large. We thus see that the maps 
4>j{r) := vjr obey the ergodicity hypothesis in Theorem 2.17. We can invoke that 
theorem (in the contrapositive) and be done as soon as we establish 

Proposition 8.4 (Existence of a system of pseudorandom majorants). Consider 
the hypergraph system {J,{Z)j^j,d,H), where J :— {0,... , — 1}, d := k — 2, 
H := (^), and for each e = J\{i} G H, define the function v^: Z*^ ^ R"*" hy 

Pe{{Xj)jee) ■■= I^C^iVj - Vi)Xj). (51) 

jee 

Then, the constant is chosen properly, and if e is sufficiently small depending 
on k,vi, . . . ,Vk, the system {ve)eeH is a pseudorandom system of measures, i.e. it 
obeys the dual function condition, the linear forms condition, and the correlation 
condition. Note that we take N to be the pseudorandomness parameter, and allow 
our bounds to depend on s, k, vi, . . . , Vk, <fi- 



It remains to prove the above proposition. We shall do this in stages. First we re- 
duce matters from controlling various estimates involving to estimates involving 
ly. More precisely, wc will deduce Proposition 8.4 from the following two proposi- 
tions, whose proof wc shall give in later sections. Wc first need some notation. 

Definition 8.5 (Gaussian t-tuples). Let T be a finite set. If L = {Lt)t^T G Z[«]^ 
is a T-tuple of Gaussian integers, and x = {xt)teT € Z'^ is a T-tuple of elements of 
Z, we define the quantity L • x e .Z^^ to be the quantity 

L-x:= ^Re(Lf)a;t,^Im(Lt)a;t . 
\teT jeT J 

We say that two T-tuples L, L' are incommensurate if they are both not identically 
zero, and we have L ^ qL' and L ^ qjj for any Gaussian rational g e Q[i]. 



The basic point here is that if L, L' are non-degenerate and incommensurate then 
for any fixed 6, b' G Z^ and some unknown x e Z^, there is no obvious correlation 
between L • x + 6 being a Gaussian prime (or almost prime) and between L' • x + 6' 
being a Gaussian prime (or almost prime), other than those arising from small 
divisors (which have already been eliminated through the W^-trick). Note that if all 
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the components of L lie in a subspace of C (e.g. on the real axis), then L • x + 6' will 
also be constrained to a line, however as it turns out this will not significantly affect 
the densities and correlations of v due to our removal of the exceptional primes. 

We now formalize the above heuristics. 

Proposition 8.6 (Linear forms condition for Z[i]). Let S be a finite set of cardi- 
nality \S\ < k2^, and T he a finite set of cardinality \T\ < 2k. For each s G S, let 
Lg e Z[i]'^ be a T-tuple, with any two Ls,Ls/ with s ^ s' being incommensurate. 
Then, if the exponent Ck used to define R is sufficiently small depending on k, W is 
sufficiently large depending on {hs)seS, in (50) is chosen correctly (depending 
only on ip), and e is sufficiently small depending on {'Ls)seS! we have 

E(n KL. • X + &,)|x e Z^) = 1 + o^^^^^_,_(L^)^^^(l) (52) 
ses 

uniformly for all choices of {bs)sGS G i^^)^' ■ 

Proposition 8.7 (Correlation condition for Z[i]). Let to < 2*^ and v e Z[i]\{0} be 
arbitrary. Then, if e is sufficiently small depending on m,v, there exists functions 
r^'^ = Ty}n : Z'^ — > R"*" fori = 1,2,3 which are even (i.e. r^'^(— a;) = t^''\x)) which 
obey the moment conditions 

E(r«(a;)«|a; e Z"") = Og,^(l) for 1 = 1,2 (53) 

and 

E(T(^^)(0,a;)«|a;e Z) =0,,„(1) (54) 
for all integers 1 < q < oo, and furthermore will obey the moment conditions 

E(r(') iv'-xy\xeZ) = Og^^y (1) for I = 1,2 (55) 

for all integers 1 < g < oo and w G Z[i]\{0}, if e is sufficiently small depending on 
m,v,v'. Furthermore we have the correlation estimate 
E(i/(?; ■ x + hi) . . . jy{v ■ x + h„i)\x £ Z) 

< r^'Hhi-hj)+T('\vhi-vTj)+ T^'Hvhj-vYj) 

l<i<j<m l<j<m ^ ' 

for all hi,. . . ,hm G Z"^ (not necessarily distinct), where the conjugation operation 

h h and the scalar m,ultiplication operation h i-^ vh on Z"^ are inherited from the 
corresponding operations on Z[i] in the obvious manner. 

The r'^^ term in (56) appeared in [9]. The r^^^ term is new and reflects the un- 
avoidable fact that i/{x) and ^{x) will be very strongly correlated'*, since if p is a 
Gaussian prime or almost prime then p will be also. Similarly, the t*-^-* term is new 
and reflects the facts that v will have an anomalous density on the real line (or on 
multiples of that line by v). Note that while t*^"^) is ostensibly defined on Z"^, only 
its values on x Z are relevant, since this is where vhj — vhj takes its values. 

Proof [of Proposition 8.4 assuming Proposition 8.6 and Proposition 8.7] Wc have to 
verify that {ve)eeH obeys the dual function condition (Definition 2.7), linear forms 

^At least, this is the case if b is real. If b is complex then one has to shift a; or a; by a fixed 
fa<;tor. 
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condition (Definition 2.8) and the correlation condition (Definition 2.13). We begin 
with the dual function condition. Fix e = J\{i} and x'^^ G Vg. Using (4) to expand 
out 'De{i'e + 1), and then using (51), it suffices to show that 

E(n ^(E(«^- - e Ve) = 0(1) 



for all C {0, 1}'^\0'^. But this follows from Proposition 8.6; note that as the Vj are 



all distinct, each of the linear forms "^j^^ivj —Vi)x^'^ utilizes a distinct non-empty 



subset of the variables in xi^^ and so the hypotheses of that Proposition are easily 
verified. 

We now verify the linear forms condition. By (51), it suffices to show that 

E( n i^iT.(^j-Vi)xp'>)\xf,x<^peVj) = l + ow^oo{l) (57) 

for any any finite set S of pairs (e,w) such that e G H, u) G {0,1}®. we can 
parameterize the averaging variables by {xt)teT & Z'^, where T is the finite set 
T = J X {0, 1}. We can thus write the left-hand side of (57) as 



E j []i.(L,.x + 6,)|xeZ^j 



where for any (e,a;) G 5 with e — J\{i}, we have 

The hypothesis that the Vj are all distinct ensures that the T-'e,u> are non-zero. In 
fact they are all pairwise incommensurate, because each Lg^^j has a different set of 
non-zero co-ordinates. Thus (57) follows from Proposition 8.6, if e is sufficiently 
small depending on the Lg^oj, which in turn depend only on the k,Vi, . . . ,Vk- 

We now turn to the correlation condition. Fix e = J\{i} G H, j € e, K > 0, and 
ne,ui € {0, 1}. The left-hand side of (6) can be expanded as 



\a=0 VweAo / 



K 

"eUil'Mfi} 



where Aa := {co G {0, 1}® : ne,ai = 1; = a} and 

h^:= ^ {vj> - Vi)x^p'\ 
J'ee\{j} 

By Cauchy-Schwarz and symmetry it suffices to show that 

E(E( n i.{{vj - v,)xj + h^)\xj e Zr^\x<•^^.y,x'^J^^.y e Z^\i^}) = Ok{1). 
ojeAo 
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Applying Proposition 8.7 with v :— Vj — Vi 0, we have 
E( iy{{vj - Vi)xj + h^)\xj G Z) 

where r'-'-' = '''|a'o| u' assuming of course that e is sufficiently small depending on 
k,vi,... ,Vk- By the triangle inequality, we can thus bound the left-hand side of 
(6) by 




2K 



Since |^o| = it thus suffices to show that 

for any distinct in Aq. 

Fix uj,u>' e ^0- Let us first deal with the T^^\hi^ — /iw')^^ term. Observe that the 
map from i^^^^j^ , a^e\{ ^ j ) to huj — h^i' is a group homomorphism from Z*^^^^^ x Z^^^^^ 
to Z^ . Since Z is a cyclic group of prime order, we thus see that the image of this 
homomorphism is cither {0}, Z^, or a line of the form {v' • x : x € Z} for some 
v' e Z[z]. Also, all the fibers of this group homomorphism have the same cardinality 
(they arc all cosets of the same kernel). Since all the Vj arc distinct and uj ^ oj' , 
we see that the image is not zero. If it is Z^ then the claim now follows from 
(53). If the image is a line, then the Gaussian integer v' € Z[i] depends only on 
k,v\, . . . ,Vk,(-o,Lu'. Since the number of values of lu.lu' is 0(1), we thus see that if 
e is small enough depending on A:, wi, . . . , Vk, the claim will now follow from (55). 

The contribution of the t^'^^ {vh^^ — vh^^i)"^^ term is dealt with similarly; note that 
the map from , j} ) to vht^ — vh^i is still a group homomorphism whose 

image is not identically zero. 

Finally, we control the contribution of r*-^^ . Here we use the improved ergodic 
hypothesis, Hypothesis 8.1. This implies that the map from {^^e\{3} ^ -^^eXij}) 
is a surjective group homomorphism, and hence the map to vh^ — vh^ has image 
X Z. Thus the contribution of r^^^ can be controlled purely by (54). ■ 



9. Reduction to a number-theoretic estimates 



To conclude the proof of Theorem 1.2, we have to verify Proposition 8.6 and Propo- 
sition 8.7. These propositions are estimates on the function v, which was defined in 
(50), partly in terms of the truncated divisor sum ^Zh]' r u> ^"^^ partly in terms of 
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the constant function 1. In this section we reduce matters purely to estimation of 
the truncated divisor sum. In particular we reduce Proposition 8.6 to the following 
estimate. 

Proposition 9.1 (First Goldston-Yildirim correlation estimate for Z[z]'j,^). Let S, T 
be finite sets. For each s G S, let Ls £ Z[i]-^ be a T-tuple, with any two Lg, Lg' with 
s ^ s' being incommensurate. For each s G S, let Gg S Z[i] be a Gaussian integer 
coprime to W. Let B C Z"^ is a product B = Ylier "/^ intervals It C Z, each 
of length at least Then, ifW is sufficiently large depending on {'Ls)seS! we 

have 

E ( n AZW',„K,^(^(L. • X) + a,)2|x e b] 
\ses ) 

= (1 + 0fl^oo.|5|jT|,vp,W^,(L3)3es(^) + °W^cx>;|S|,|T|,¥>,(L3)3€s(^)) (58) 

N0n]ogN(7?)Y^' 

•^ziijC^) ) 

for an explicit quantity c,^ > depending only on y. 



Similarly, we will reduce Proposition 8.7 to the following estimate. 

Proposition 9.2 (Second Goldston-Yildirim correlation estimate for Z[i]^g). Let 
m be a positive integer, let h be a Gaussian integer coprime to W, and let v be a 
Gaussian integer. Let hi,. . . ,hm be Gaussian integers such that the quantity 

A:= Yl ^{hi-hj) Yi ^{W{hiV-Yjv)-bv + bv). (59) 

l<i<j<'m l<i<j<m 

is non-zero. Let I C Z be an interval of length at least i?^"™. Then, if W is 
sufficiently large depending on v, we have 



Ein^z[i]',„ii,^w.+H+^') 



<0r, 



(60) 



where P[i]Y w those primes in P[i]+ which are coprime to W. 



The presence of the rather unusual expression W{hiV — hjv) — bv + bv in (59) can 
be partially explained by the following observation: if W{hiV — hjv) — bv + bv = 0, 
then we have 

v{W{hi + nv) + b) = v{W{hj + nv) + b) 
for all n. Thus there is likely to be a strong correlation between W{hi + nv) + b 
being prime or almost prime, and W{hj +nv) + b being prime or almost prime. This 
correlation also occurs in the diagonal case i = j, reflecting the fact that ^2(2]' r ^ 
is substantially larger on the real line (and on multiples of the real line by gaussian 
rationals of small height) than in general. Thus we expect the left-hand side of (60) 
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to be abnormally large when A is zero; it turns out that it can also be large when 
A is very smooth (has many small prime factors). 

We now show how these propositions imply Propositions 8.6 and 8.7. 

Proof [Proof of Proposition 8.6 from Proposition 9.1] This shall follow the proof of 
[9, Proposition 9.8]; the main idea is to discretize the domain to the point where 
the boundary effects caused by the constraint N(n) < e^A^^ in (50) are negligible. 
In this proof we allow all constants to depend on l^l and |T|, and (Ls)s£s- 

Let us view Z"^ as the discrete cube {~N/2, iV/2)^. If the constant Ck used to define 
R is sufficiently small, we can find an integer Q = Q{N) such that N/Q > 2i?^°l'^l 
and 1/Q = OAr^oo(l) (thus Q grows slowly with N). We partition {-N/2,N/2)'^ 
into Q'"^' boxes (-Ba)QeA, each of sidclength N/Q{1 + ojsi^oci^)). Then, up to 
multiplicative errors of 1 + ojv^(X)(l), the left-hand side of (52) is equal to 

E jE([|z/(L«-x + 6,)|xeB„)|ae A| . 
It thus suffices to show that 

E I E( JJ vi^, ■ X + 6«)|x e B„) - l|a e A J = Oiv^<x>(l)- 
Writing = 1 + (j/ — 1) and expanding, it suffices to show that 



E [E([|(i/(L, •x + 6«)-l)|xeBa)|ae a] = o;v^^(1) 



(61) 



for all non-empty 5" C S. 



Fix S' . Let D denote the disk £> := {n e Z[i] : N(n) < t^N"^}. We divide the boxes 
Ba into three categories. We say that a box Ba is interior if Lg • x + 6s e t^{D) 
for all s G S' and x € Ba- We say that a box Ba is exterior if there exists an 
s G S' such that Lg • x + 6^ ^ ""(D) for all x e We say that a box Ba is 
borderline of type s for some s e 5' if Lg • x + feg e T^iD) for at least one x e 
and Lg • y + fog ^ O for at least one y G -Ba- Clearly every box is either interior, 
exterior, or borderline for some s G S' . From (50), the exterior boxes give a zero 
contribution to (61). Now consider an interior box B^- For these boxes we claim 
that 

E( [J (z/(L, • X + 6,) - l)|x e Ba) = ojv^oo(l). 
seS' 

Expanding out the product usig the binomial formula, it suffices to show that 
E( Yl z/(L,-x + 6s)|xeB 1 + OiV^cx)(l) 

sGS" 

for all S" Q S' . At this point we need to make a technical remark concerning the 
identification between elements of Z[i] and elements of Z'^, and between Z'^ and 
(— 7V/2, 7V/2)'^. Currently, x is viewed as an element of Z^ , and bg is an element 
of Z^, and so Lg • x + 6^ is also an element of Z'^. But using tt, this element of Z^ 
is then considered to be an element of Z\i], which in fact lies in the disk D. 



CONSTELLATIONS IN THE GAUSSIAN PRIMES 



39 



We now change this perspective, viewing x now as an element of {--N/2, N/2)'^ 
(and Ba as a box of sidelengths « N/Q inside {—N/2, N/2)'^). This makes • x 
an element of rather than Z"^, although the dimensions of the box Ba will keep 
Lg • x constrained to a ball of radius Oj^ {N/Q). We now wish to view bg as an 
element of Z[z] also, but one has the freedom to modify bs by an element of N7i[i\ 
in doing so. However, only one of these "lifts" of bg will place Ls • x to lie in D. 
Indeed, since • x is constrained to a ball of radius much less than N , there is a 
unique lift of bg in Z[z] (which by abuse of notation we shall continue to call 6s), 
independent of the choice of x, for which • a; + 6^ lies in D, now viewed as a 
subset of Z[i] rather than Z^. Applying (50), we can now write the left-hand side 
as 

mwy^^% ^Zw,,.,.(^L. . x + a.)|x . Ba) 

where := Wbg + b. Applying Proposition 9.1 and choosing dp := l/c^, this 
expression is equal to 

(1) 

which is acceptable since ii is a small power of N, and W is chosen to grow extremely 
slowly in R. 

It remains to control the contribution of the borderline boxes of type .sq for some 
So € S' . Since = 0(1), we may fix Sq. Bounding v — 1 in absolute value by 
V +1, and using several applications of Proposition 9.1, we can control 



E( J|(i.(L,-x + 6,)-l)|xeB„) 



crudely by 0(1). To conclude the proof it suffices to show that the number of 
borderline boxes of type so is small, in the sense that it is OAr^oo(l) times the total 
number Ql^l of boxes. 

Observe that the set {Lg^ • x + : x e .Ba} has a diameter of 0{N/Q), where the 
metric on Z"^ is the quotient metric inherited from Z[i]. Since Ba is borderline of 
type So, we conclude that 

{L,, • x + : x e B„} C {n e Z[i] : |n| = iV + 0{N/Q)], 

where the annulus on the right-hand side is thought of as a subset of Z^ . Next, 
observe that the map x i-^- L^q ■ 'x.-\- bg^ is an affine homomorhpism from Z^ to 
Z'^, and thus (since Z has prime order) the image is an afRne subspace of Z'^, 
with the fibers at each point of this image having equal cardinality. Since Ls^ 
is not identically zero, the image is either an affine line in Z^ (with a "slope" 
determined entirely by L^g) or is all of Z^. In either case, we see from elementary 
geometry that the proportion of points in this image which lie in the annulus 
{nG Z[i] : \n\=N + 0{N/Q)} is OQ_>oo(l) (indeed one can obtain the more precise 
bound of 0(Q~^/^), because the circle {|n| = N} has non-vanishing curvature, 
though we will not need that improved bound here) . Thus the proportion of boxes 
which are borderline of type so is oq^oo{^)i which is acceptable since Q is growing 
with N. ■ 
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Proof [of Proposition 8.7 from Proposition 9.2] The arguments here arc somewliat 
similar to the derivation of Proposition 8.6 from Proposition 9.1 but are simpler 
because we are only seeking upper bounds rather than asymptotics. On the other 
hand, some number theory is required to control the expressions arising from Propo- 
sition 9.2. 

Fix TO, V. We first observe that we may remove the requirement that r^'^ is even, 
since we may simply replace t^''\x) by t^''\x) + t^'^(— a;) if necessary. 

We begin by establishing the very crude estimate 

'^{v-x + hj)\x eZ) = Om,^ ( (log" TV) sup dz[i] ) ' 

j=l V neZ[i\:\n\<NW J (62) 

where d2^^.^{n) is the Gaussian divisor function 

deZ[i]:d|n 

To see this, we first use Holder's inequality to bound the left-hand side of (56) very 
crudely by 

I sup i'{x) 

By (50) and (49), this can in turn be crudely estimated by 



/0Z[,j(WOlogN(i?)\'" 

V ^y^) J neZ[i]:N(„)<e2JV2 



deZ[i\'^^:d\Wn+b 



The claim (62) follows. Next, observe that d2,^.^{n) = Oe(N(n)^) whenever n is a 

power of a Gaussian prime p, and can in fact improve this to d'^^.^{n) < N(n)^ if 
N(p) is sufficiently large depending on e. Using the multiplicativity of rfz[j](?^) we 
conclude that rfz[j](n) = Oe(N(n)^) for all n, and so we see that the right-hand 
side of (62) is Om,^,e{^^) fo^' any £ > 0. 

In light of (62), we will define r(i)(0), r^^^W-^bv -bv)) and r'^^^W-^bv -bv)) 
to equal the right-hand side of (62), and observe from the preceding discussion that 
this will not significantly affect (53), (54) or (55). 

It now remains to treat the cases when hi — hj ^ for all 1 < i < j < to and 

W{hiV — hjv) — bv + bv ^ for all 1 < i < j < m. In other words, we arc left 
with the case where the quantity A defined in (59) is non-zero. We now use (50) 
to crudely estimate 

t^<:^^r ^ZmmHw.,,HjWn-Hn)+br ^ 

^y^)^^ + ^^ N(W) l0gN(i?) ^N{^-Hn))<e-N- 
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and expand terms, to reduce to establishing an estimate of the form 



(\ ' 
N(W01ogNGR)\ 



J2 T^^\hi - hj) + T^^\hiV - hjv) + J2 r^^\hiV-hiv) 

l<i<j<m l<i<m 



for functions r^^^r^^^r^^) obeying (53), (54), (55), in the case A 7^ 0. 
Using the identity 

E(/(a;)|a; e Z) = E(E(/(a; + n)|l <n< N'^/'^)\x e Z) 

we see it suffices to obtain an estimate of the form 



E n ^Z[^'^,,rJ^^~'(^ ■{x + n)+ h,) + 6)21n(.-.(„.(,+ 



n)+/i,))<e2JV2 



|l< n < N'^/'^ 



^ /N(W01ogNCR) Y 



T^'\hi - hj) + T^^\hiV - hjv) + J2 r^^\hiV-hiv) 

l<i<j<Tn 



l<i<m 



uniformly in x. By absorbing v ■ x into the hj term we may take a; = 0. We then 
observe that this sum is zero unless there exists an n for which | tt" ^ (w • n + /i^ ) | < eA'' 
for all j. In particular this forces 

\TT-^{hj)\ < eN + OilvlN^/"^) = 2eN 

if N is sufficiently large depending on e and \v\. In particular, by the triangle 
inequality, r only needs to be defined on the region {x G Z"^ : |7r~^(a;)| < 4eA^}. 
Now observe that 'k~^{v ■ n + hj) = 'K~^{hj) + nv (if e is sufficiently small to avoid 
wraparound issues). Setting hj := n~^{hj), we thus reduce to showing that 



(\ i 
<^Z[.](^) J 



y ^ fi{hi - hj) + f2{hiV - hjv) + f3(/ijt> - /ijw) 

l<i<j<Tn l<i<m 



for all distinct hj E {x E Z[i] : \x\ < 2eN}, and for functions fi, 72, 73 : Z[i]\{0} 
R"*" supported on the punctured disk D := {x E Z[i] : < < 4eA^}, such that 
the functions r^'^ := f; ott"-^ obeys (53), (54), (55). 
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Applying Proposition 9.2, and assuming that W is large enough depending on v, 
and the exponent Ck used to define R is sufficiently small, we can bound the left- 
hand side of (63) by 

/ \ m 

( N{W) logN(i?) \ 



f ( N{W)logN{R) \ 



n n _ (i+o„.(N(p)-v2)) 

n n _ (l + 0™(N(p)-V2))^ 

^^''^'^peI'[i\'^_^:p\N{wChiV-hiv)-bv+'bv) 

and so by the arithmetic mean-geometric mean inequality we will be able to satisfy 

(63) by setting 

h{x) := lD{x)Om{ n (1 + 0„^(N(p)-l/2))-') 

P6P[i];,w^P|N(a:) 

and 

f2{x) = Mx) := 1d{x)0^{ n (1 + 0„(N(p)-i/2))™^) 

PeP[j]:,.,„,:p|N(Wx-6j7+6u) 

We now need to verify (53), (54), and (55). Let us first verify (53) for r^^h We 
need to show that 

xe-DpgP[j]^:p|N(a;) 

Estimating 

n (i+o„(N(p)-i/2))'"'«< n (i+o„,,(N(p)-i/2) 

p6PW;,w:p|N(a:) pePW;,„,:p|N(x) 

<Om,,i n (l + N(p)-l/4)) 

peP[i];,vK^p|N(x) ^g4^ 
< 0„,,( ^ N(d)-V4) 
<ieZ[i]'^,„,:d|N(x) 

where Z[i]^gjy are those elements of Z[i]g^ which are coprime to W. We then see 
that 

y: n (i+o„(N(p)-v^))-^«<o„„(5: ^ N(d)-V4) 

xenpgP[j]/^.p|N(^) ^6-DdeZ[i];^ „,:<i|N(a;) 

= 0,„,«( I] N{d)-'/^\{x € D : d\-N{x 

Since d G Z[i]^g we have d = pi . . .pfe for some distinct (non-associate) pi, ■ ■ ■ ,Pk G 
P[i]'. we see that if rf|N(a;), then rf'|N(a;), where d' = p'l - . -p'^ and eachp^- is either 



CONSTELLATIONS IN THE GAUSSIAN PRIMES 



43 



associate to pj or to pj. There are 0(2*^) possible values of d', and they all have 
the same norm as d. We thus see that 

\{xgD: d\-N{x)}\ = 0(2'=|£»|/N(d)) = 0{2''N^ /N{d)). 

Since there are only finitely many Gaussian primes in any given bounded set, we 
see that 2*^ = 0(N(rf)^/^) (for instance). Thus we have 

^ n (l + 0™(N(p)-i/2)r« = 0^,,( ^ N(d)-i/4N(d)i/«7VVN(d)) 

= OmAN' E N(d)-9/«) 
deZ[i]\{o} 

as desired. 

Now we verify (55) for r^^'. If v' is fixed, and W, N are large with respect to v', 

then the set {x E Z : tt~^{v' ■ x) € D} is essentially a union of Oy'{l) intervals of 
length 0{N), on which n~^{v' ■ x) is an arithmetic progression of step r := n~^{v'). 
It thus suffices to show that 



E n (l + 0,n(N(p)-l/2))™^^ 

jeZ,:j=O(N),a+jrTt0 peP [i] :p|N(a+ jr) 



Or„,5,r(iV) 



where a = 0{N) is a Gaussian integer. Applying (64) again, we bound the left-hand 
side by 



O. 



( \ 

^jeZ:j=O{N),a+jrji0 deZ[i]^,:ci|N(a+jr) J 



--Om.a\ E N(d)-i/4|{jeZ:j = O(iV),d|N(a+jr),a + jr^0}| 



We can assume that d = Or{N) since the larger values d give a zero contribution 
(recall that a = 0{N)). As before, we can replace the constraint d|N(a + jr) by 
d'\a + jr, where d' ranges over 0(2*^) = 0{'N{dy^^) possible values, all with norm 
equal to N{d). The Gaussian integer d' is (up to Gaussian units) the product of 
primes p in with no prime appearing at most once. For each of these primes, 

the group Z[i]/pZ[i] is a cyclic group of prime order, thus has no proper subgroups. 
From this fact and the Chinese remainder theorem for Gaussian integers, we see 
that \{j e Z : d'\j}\ = N(d)Z. We can thus estimate the previous expression by 



Or, 



E N(d)-V4N(d)V80(l + 



N 



\ 



as desired. 
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Now we verify (53) for r'^^) . We need to show that 

Yl n (1 + 0„(N(p)-V2))-(™-i)9 = OmAN^). 

a:eDpgP[j]^.p|N(Wx-6j7+5i;)7tO 

By modifying the computations in (64), (65) we see the left-hand side is 
OmA Yl ^{dr^^^llxeD-.dlNiWx-bv + bvj^O}]). 

As before, we see that if d divides N{Wx — bv + bv), then d' divides Wx — bv + bv, 
where d' ranges over 0(N((i)^/*) Gaussian integers in Z[i]'g^ yy with the same norm 
as d. Since the radius of D is large compared with d, W, b, or v, we see (using the 
Chinese remainder theorem, since rf' and W arc coprimc) that for any fixed d', the 
number of elements a; of D for which d' divides Wx — bv + bv is 0{N'^ /N{d')) = 
0{N'^ fNi^d)). The proof of (53) then proceeds as with t*^^-'. For similar reasons we 
can adapt the proof of (55) for r^^-* to also give a proof for t^^\ which then also 
implies (54). ■ 



10. Proof of Proposition 9.1 



To conclude the proof of Theorem 1.2, we need to prove Proposition 9.1 and Propo- 
sition 9.2. This is the purpose of this section and the next. As in [9, Section 10] 
or in the earlier work of Goldston-Yildirim, the idea is to first use the Chinese 
remainder theorem to essentially replace 7i[i] with the product of more local ob- 
jects such as Z[i]/pZ[i]. We then use the non-degcncracy hypotheses on the Laj 
to compute the contribution of each local object, leaving us with an Eulcr product 
over Gaussian primes, which we will estimate by using the pole and residue of the 
modified Gaussian integer zeta function Cz'[i]('^) (which also has an Euler product 
representation) at s = 1. 

We turn to the details, starting with the proof of Proposition 9.1. We begin by 
eliminating the role of the box B. Using (49), we can write the left-hand side of 
(58) as 



«£'S'd3,dieZ'[i]:d«,d^|W'(L,,-X)+a, 



, ,,,, /logN(d^\ /logN(4)\ 



X e B 
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From the support of i^, we may restrict the ds^d'^ summations to the range where 
N(ds), N(dg) < N(R). We can thus rearrange the above expression as 

log^ N(i?) ^ 
161^1 ^ 

d,,d'^GZ'[i]:N(d,),N(df,)<N(fl)VsG5 

Vses / 

where [d, c?'] is the least common multiple of d and d' (this is only defined up to 
association). Let D = D[{[ds,d'g\)sQs) G Z+ be the smallest positive rational 
integer which is a multiple of all of the ds,d'g for s e S. Since each of the ds,d'g 
have norm at most N(i?) = B?, we have 

£) < [| N(4)N(d'i) = 

On the other hand, B has sidelength at least Since the solutions x to the 

system 

[ds, I W^(L, • x) + a, for all s e 6" 

are periodic of period D (of course, the period could in fact be smaller) in each 
component of x, we thus conclude that 

\d.4'.\\wih.-^)+aM ^B)= a;((K,<]),es) + 0|T|(i?-'l^l) 

sGS 

where w is the expression 

u{{dsUs) := E(n ^dAw(L..^)+aM e (Z/^Z)^). (67) 

Note that w is unchanged if one of its arguments is replaced by an associate, so 
we may legitimately use expressions such as [(is,c?s] the arguments of uj. The 
contribution of the error term 0\t\{R~^^^^) to (66) is at most 

0|5|,|T|((logl^lN(i?))N(i?)2|^^li?-6|5|) 

which is certainly acceptable. Thus it suffices to show that 

E 

<i,,.rf^,eZ'[i]:N(da),N(d^)<N(7?,)VsGS 

n '.z,.,(«.z,.,K). (™) . (™) <MUs) 

= (l + 0fl^<x);|S|,|T|,¥.,W,(L,),gs(^) + °M'^cx>;|S|,|T|,¥P,(L3)3es(^)) (iogN(i?) (b'Z[{^) 

for some > depending only on ip. Now observe that since the as are coprime 
to W, so are W^(Ls • x) + a^. Thus the above summand vanishes if any of the ds,d'g 
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share a common factor with W. Thus we reduce to showing that 
d,,d'^eZ[i\'^^ „ for all s6S«es 



E n/z,,(^.).z,,w> (™) . (™) ^(M„<i),«) 



- (l + 0fl^cx>;|S|,|T|,¥>,W',(L3)3es(^) + V^<x,;|S|,|T|,¥.,(L3).es(^)) (^logN(i?) 0Z[i](M^) 

where Z[i]^gj;^, := {n e Z'[i]w ■ {n, W) = 1}. Also, we have taken advantage of the 
supports of the (p to drop the restrictions N(rfs),N(d^) < N(i?). 

To proceed further, we need to understand the quantity uj{{ds)ses)- This quantity 
clearly ranges between and 1, but much better estimates are possible. Firstly, we 
observe that uj is partially multiplicative: 

Lemma 10.1. If dg e Z[z]'j,g for all s £ S, we have 

neN{P'[i]):n>w 

where (d, n) is the greatest common divisor of d and n in the Gaussian primes 
(defined up to association). 



Proof Observe from unique factorization (and the hypothesis dg € Z[i]^g that 
solving the linear system 

ds\W{lis • x) + as for all s e 5 

is the same as solving the linear systems 

{da, n)| W(Ls • x) + as for all s e 5 

simultaneously for each n € N(P'[i]) with n> w. Note that each individual linear 
system is then periodic with period n £ N(P'[i]). Since all the elements of N(P'[i]) 
are rational primes, the claim then follows from the Chinese remainder theorem. ■ 



The above lemma splits lu into local expressions at a single value of n G N(P'[i]). 
We now estimate each of these local terms; it is here that we must use the various 
non-degeneracy hypotheses we have placed on the Laj . 

Lemma 10.2 (No significant local correlations). Let n G N{P'[i]) be such that 

n>w, and for each s let ds € Z[iy^^ be such that ds\n (thus for fixed n there are 
only four possible values of dg, up to association). Suppose that w is sufficiently 
large depending on the linear forms {'Ls)ses- Then oj{{ds)ses) = 1 'ifYlses^^ ^ 
Gaussian unit, LL!{{ds)ses) = ^/n ifYlses^s is a Gaussian prime, andio{{ds)ses) = 
0{l/n^) otherwise. 



Proof The claim is trivial when Hses '^s is a Gaussian unit. Now suppose that 
rises ^ Gaussian prime p, which is necessarily unexceptional since di, . . . , dm G 
-^Wsg.w thus have N(p) = n, and one of the dg is associate to p, with the 
remaining dg being Gaussian units. By (67), it suHices to show that 

E(lp|;^(L..x)+aJ^e(Z/nZ)^) = l/n 
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for each s e S*. Since N(p) = n > w, wc sec that W is invertible in 7i[i\/p7i[i\, and 
so the map a; ^ Wx-\-as is a bijcction on 7i[{\/p7i[i\. It wiU thus suffice to show that 
the homomorphism from [Ti/nTiY to Z[«]/pZ[i] induced by the map x Lg • x 
is surjective. But since p is unexceptional, Z[i]/pZ[i] is a cycUc group of prime 
order. Since the hnear part of V's is not identically zero, the claim follows if w 
is assumed sufficiently large. 

Now suppose HsgS ^« "^^^ ^ ^'^^^ ^ Gaussian prime, then there exist Gaussian 
primes p,p' with norm N(p) = N(p') = n, and indices s, s' , with either s ^ s' or 
p not associate to p', such that rfg is a multiple of p and ds' is a multiple of p'. It 
thus suffices to show that 

E(lp|H^(L..x)+a.V|H'(W.x)+aJ^e (Z/nZ)^) < l/n\ 

Observe that is the cardinality of Z[i]/pZ[i] x Z[i]/p'Z[i]. Again, since N(p) = 
N(p') > w, the map {x,y) i-^ {Wx + as,Wy + as') is a bijection on Z[i]/pZ[i] x 
Z[i]/p'Z[i]. It thus suffices to show that the homomorphism $ from (Z/nZ)"^ to 
Z[i]/pZ[i] X Z[i\/p'Z[i] induced by x i— > (Lg • x, L^/ • x) is surjective. 

Suppose first that s ^ s'. Observe that as p and p' are Gaussian primes with the 

same norm n, they are cither associate to each other, or else p is associate to the 
complex conjugate of p'. In the latter case we may replace L^, with their complex 
conjugates Lg, aj; note that this does not affect the hypotheses we have placed on 
the Laj or Ca- Thus up to association we may assume that p = p' . 

Suppose for contradiction that $ is not surjective, then its image is a proper sub- 
group of Z[i]/pZ[i] X Z[i]/pZ[i], i.e. a line or the origin (note that Z[i\/pZ[i\ is a 
finite field of rational prime order, since p G P[{\' is unexceptional). Since the Lg 
are non-zero, the latter option is ruled out (if W is large enough depending on the 
Ls). Thus the image is a line. This forces and L^' to be concurrent in the finite 
field geometry {Z/nZ)^ . But this implies that LstLs'v — Lst'Ls'v is divisible by 
n for all t,t' G T. If w and hence n is sufficiently large depending on the Lgi, we 
conclude that LgtLs't — LgfLs'i = for all t,t' £ T, but this forces and Lg/ to 
be Q[i]-multiples of each other, contradiction. 

It remains to consider the case when ,s = s', which forces p' to be associate to a 
conjugate of p. Performing the conjugation, it suffices to show that the homomor- 
phism [Z/nZy to Z[i]/pZ[i])^ induced by x i— > (Lg -x, Lg -x) is not surjective. But 
this follows by arguing as before. ■ 

As a particular corollary we obtain the crude estimate 
Lemma 10.3. // ds S -P[i]vv^ for all s G S, we have 

[{N{ds))ses\ 

where [{N{ds))ses] is the least common multiple of the N(rfs). 
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Proof Using Lemma 10.1 it suffices to verify this wlien ds ail divide n for some 
n G N(P[i]) with n > w. But then this follows from Lemma 10.2, just by using the 
crude bound co{{ds)ses) < ^/n whenever Hses '^s is not a Gaussian unit. ■ 

With these estimates in hand, we can now return to proving (69). We would like 
to take advantage of the multiplicativity of w to obtain a Euler factorization of the 
left-hand side, but we must first deal with the non-multiplicative factors (p. This 
we shall do by Fourier expansion^. Since i^{x) is smooth and compactly supported, 
so is e^<p(a;), and so we have an expansion 

/oo 
^(i)e-*^* dt (70) 
-oo 

for some function %p depending on ip which is rapidly decreasing in the sense that 
V'(t) = Oa((1 + 1*1)""*) for all A > 0. In particular V is absolutely integrable and 
there there will be no difficulty justifying interchange of sums and integrals in what 
follows. We can now expand 

We could substitute this into (69), which is essentially what is done in [9] (and in 
the earlier work of Goldston and Yildirim in [6], [4], [5]). However, one would then 
eventually need to estimate expressions for large t which would require knowledge 
of a zero- free region of the zeta function for Z[i] around the axis s = l-\-it. While 
this is certainly possible, one can avoid any dependence on a zero-free region (other 
than that near s = 1) by truncating t at this stage of the argument, thus making 
the argument slightly more elementary. More precisely, let / be the interval {t S 
R : |t| < log^^^ N(i?)}, and exploit the rapid decrease of ip to now write 

j N(rf)-(i+^*)/iogN(ii)^(^^ ^^^^f +0^,^(d-ViogN(H)i„g-Aj^(^))_ 

J I \^og^{K) J 

for any A > 0. Multiplying this out (and taking advantage of the fact that the 
terms are supported on the region where N((i) < N(i?)), we obtain 

nN(d«)-(i+'*')/^°sN(ii)N(^/j-(i+<)/iogN(ii) ^{t^)^(t',)dtsdt', 
ses 

^y. / logN(rf.) \ /logNK) 
li'^VlogN(i?);^VlogN(i?) 

+ 0^,v.,|S|((n dsdX'^'"^^'-^^ log-^ N(i?)). 
ses 



^One could also express <^ as a contour integral, which amounts to much the same thing. 
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This allows us to write the left-hand side of (69) as 
' ^ d,.d'^eZliY^^ „VseS 



ses 

plus an error term 



Let us first dispose of the error term. By Lemma 10.3 this expression is bounded 

by 

which has an Euler factorization 

where Z[z]^^ := {a + 6i G Z[i]i4^ : a, 6 > 0; a + bi\n}; note this set consists of two 
elements for every n G N{P[i]'). Direct calculation shows that 

|]-r J j/|-l/logN(K) 

d3,<6Z[8]<_"'VseS 

< (1 ^„-l-l/21ogN(R)^0|s|(l)_ 

Thus we can bound (72) by 

OA,^,|S|(log-^N(i?))n(l + «-'-'/''°^^^''^)'''^'^'^ 

neP 

where P — {2,3,5,...} is the set of rational primes. Expanding out the Euler 
product, this can be bounded by 

0.,„,.,(log-N(i?)C(l + ^^^)--« 

where ({a + it) = Yl'^=i „Aif = IlqepCl ^ ^-f^-^*)-! jg the usual Riemann zeta 
function. Using the crude bound ({a + it) = 0(1 + l/\a — 1|) for cr > 1 coming 
from the integral test, we obtain the upper bound 

OA,^,|s|(log-^+°'^^'''' N(i?)). 

The contribution of this to (69) will be acceptable if A is chosen sufficiently large 
depending on 
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It remains to show that the main term (71) is equal to 

I 

(l+OiJ^oo;|S|,|T|,v.,TV,(L,),es(^)+V^oo;|S|,|T|,v>,(L,),es(^)) (^logN(i?) (pZiili^)) 

Using Lemma 10.1, we can factorize the integrand, writing (71) as 

j^--- jK{{ts,Qses) n V'(t.)V'(i'JrftX (73) 



where 



neN(p['i]'):«>«) d,,d;,eZ[i]<_">Vses 



N(ds)"(^+'*=)/'°sN(fl)]y^^/j-(i+0/iogN(ij). 



the factor of lel"^! comes from the freedom to multiply each of rfg, d'^ by one of the 
four Gaussian units. 

Now we control the local factor. 

Lemma 10.4. Let n G N(P[i]') he such that n> w. Then the expression 
^ a;((K,<]),es) 

ds,d^GZ[i]'^"'VsGS 

n/^ZwK)MZ[i]«)N(d.)^(^+"=)/'°«N(i^)N(4)-(i+<)/iogN(i^) (74) 
ses 
is equal to 

1 -p. yr (1 - N(p)-i-(i+'*°)/'°g^(^))(l - N(p)-i-(i+<)/'°g^W) 
(1+0|S|(^))11 11 i_N(«)-i-(2+it3+<)/iogN(ii) ■ 

Proof By Lemma 10.2, all the terms in which OsgsI'^s' ^s] contain more than one 

Gaussian prime will give a not contribution of 0|s|(l/n^). Wc arc left with those 
terms in which all but at most one of the expressions [ds,d'g] arc equal to 1, with 
the remaining expression [cJ^, d^] equal to either 1 or a Gaussian prime in P[i]'^ with 
norm n. We thus can write (74) as 

1 + ^_N(p)-i-(i+'*»)/i°sN(ii) _ ]y(^)-i-(i+0/iogN(ij) 

^ N(p-)-l-(2+if3+0/logN(_R) 

and the claim follows. ■ 
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From the convergence of the infinite product nn>i ^ + 0|S| (1/'^'^)) see that 

H 1 + 0|S| (l/n^) = 1 + Ow^oo;\S\ (1). 



n>w 

We thus have 



where C2[«]' truncated Gaussian integer zeta function 

peP[i];:N(p)>«, 
Next, we obtain a crude estimate on this zeta function. 

Lemma 10.5. If a > 1 and \a + it—l\ < c for some absolute constant c > 0, we 
have 

'^ZfiW 1 

for some absolute constant cq > (which does not depend on any parameter). 
Proof Observe that 

:N(p)<«,(l - 

where denotes those Gaussian primes in the first quadrant {a + bi : a > 0;b > 
0} and 

Cz[,](a + it):=4 n i_N(!>)-^-^* - 

On the other hand, from the Chinese remainder theorem and the definition of W 
we see that 

Also, since P[i]+\P[i]'_,_ consists of 2 and the rational primes equal to 3 modulo 4, 
we see that 

n (i-N(p)-i)=ci 

peP[i]+\P[i:\'^ 

for some absolute constant ci > 0. To conclude the claim (for s sufficiently close 
to 1), it will suffice to show that 

CZ[i\i<^ + it) = (l + Oa+it^i(l)) ^^J^_^ - 
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But by the unique factorization of the Gaussian integers (and the fact that there 
are exactly 4 Gaussian units) we have 

CZ[i](^ + *i)= E Wny+if 

neZ[i]\0 

By the integral test we can estimate 

/ X f dadb ^ , ^ , 

which after polar co-ordinates becomes 

C7rA(^ + it) = + 0(1) 

and the claim follows. ■ 



Applying this lemma and recalling that ts,t'g G I and hence ts,t'g = 0(log^/^ N(iJ)), 

we conclude that 



1 



and so we can write (73) as 
,16 N{W) _ys^ 



ses 



jl + its){l + it',) 
2 + its + it' 



( 



co<^Z[i]WlogN(i?) 

(1 + Ow^oo;\S\ (1) + OR^ao;W,\S\ (1)) [[ 



ses 



2 + its + it' 



'tp{ts)ip{t',)dtsdt',. 



The contributions of the error terms ow^ca:\s\{^) + o_r^oo:VF,|S| (1)) will be ac- 
ceptable, thanks to the rapid decay of the ^ factors (and the at most polynomial 
growth of the +tt'*''^ factors), so it suffices to estimate the main term, which 

factorizes as 



16 



N{W) 



{l + it){l+it') 
CO <f>Z[{\OV) logN(ii) JjJi 2 + it + it' 

Using the rapid decay of the tp, we can write this as 

N{W) 



1 \s\ 



^{t)ilj{t')dtdt' 



where 



(C^ + OR- 



16 

Co 



.(1)) 



<^Z[.j(iy)logN(i?) 

{1 + it){l + it') 



\s\ 



2 + it + it' 



ii){t)ip{t')dtdt' . 



It thus suffices to show that is real and positive. Wc remark that this can be 
shown indirectly, by observing that the left-hand side of (58) is necessarily non- 
negative, and when \S\ = |T| = 1 one can show using (46) and a pigeonholing 
argument that this left-hand side is at least C^^^logN(i?) for some C^^w > 0, 
and all R sufficiently large depending on W; by choosing W appropriately we 
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obtain the positivity of c'^. However, we can also argue directly via the following 
Fourier-analytic argument^. Making the change of variables s — t + t', we have 



where F is the convolution 



/oo 
(1 + it)^{t){l + i{s - t))i;{s - t) dt. 
-OO 

Observe that for any real number the Fourier transform of F can be computed 



as 



/oo pOO 
F(s)e-'^* ds= (1 + it)V(t)e-'^*(l + i{s - t))V(s - t)e-'''^'-*^ dtds 
-OO — OO 

= {[ (1 + it)i){t)e-''=' dtf 

J — OO 

J poo 

= [{l-^)j J{t)e-'^'dtf 
= [e>'(x)]2 

where we have used the rapid decrease of the tp to justify all the swapping of 
integrals, and (70) in the last line. Now we write 2^ = Jq° e~'^^e~^^^ dx and 
interchange integrals again (using the rapid decay of F) to conclude 

/oo -1 /"OO /"OO /"OO 
-F{s)ds= / F{s)e-''"' dsdx = [(p'{x)f dx 
-00 2 + ^s Jo Jo 



and hence 



64 f°° 
c; = - / b'(x)]2 dx > 

Jo 

as desired. This concludes the proof of Proposition 9.1. 



11. Proof of Proposition 9.2 



Now we turn to Proposition 9.2. This will be similar to the proof of Proposition 
9.1 in the preceding section, but with a number of differences. It is a little simpler 
because there is only one parameter n to sum over rather than \T\ parameters, and 
also wc only seek an upper bound rather than an asymptotic. As such we shall move 
more V'c ipidly with this proof as compared with the similar but more complicated 
proof from the previous section. 



One can of course also use contour integration as a substitute for Fourier analysis here; the 
two approaches are essentially equivalent. 
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We begin by eliminating the role of the interval /. Using (49), we can rewrite the 
left-hand side of (60) as 

log^"-N(fi) ^ " , logN(rf,-) ^ Aogm^ 

di,d;,...,d„,d;„ez[i]^, j=i & v ; & \ ) 

m 

E(]J[ ^di,d'j\W{hi+nv)+b\n S 

i=i 

Due to the support of the Lp, wo can restrict the dj and d'^ to the region N(rfj), N(dj-) < 
N(i?). Now from the Chinese remainder theorem we have 

m 

U„d',\W(h,+nv)+b\n €l)= w{[di,d[], [dm,d'J) + 0,{R^^/\I\) 

where 

m 

uj{qi, . . . , g™) := E(]J lqj\wihj+nv)+b\n e Z/DZ). 
j=i 

and D = D{qi, . . . , (j„i) is the smallest positive rational integer which is a multiple of 
all the di,. . . ,dm- In our situation we have the crude estimate D = 0(i?^™). Since 
|/| > R}^"^, it is easy to see that the contribution of the error term 0„(ii^"*/|7|) is 
acceptable (if W is large enough depending on v, but is sufficiently slowly growing 

in N). Thus it sufhces to show that 

S ^^^^^ n/(^^)^(^|^)''z,.,(''.)''z,.,(<i;)'^(i*.<i...- .M. 

fe^OT n (i.o„,N(.)-..„ 

Here we have used the support of ip to drop the constraints N(fij), N(rfj) < N(i?) 
again. Now observe that a; vanishes if any one of the dj or d'^ shares a common 
factor with W, since b is coprime to W. Thus without loss of generality we may 
restrict dx, ... ,dm-,d'^, ... ,d'^ to Z[i]'^^^^. 

Now we must obtain analogues to Lemmas 10.1, 10.2, 10.3. By repeating the proof 
of Lemma 10.1 with only trivial changes, we have 

Lemma 11.1. If di,.. . , dm, rf'i, . . . ,d'^ G '^[i]'sq,W' ™^ ^'^^^ 

u>{qi,... ,qm) = Yl ^iili,'"'),... ,{qm,n}). 

ne'N{P'[{\):n>w 



Now we give the analogue of Lemma 10.2. 

Lemma 11.2 (No significant local correlations). Let n G N(P'[i]) be such that 
n > w, and let q-^, . . . ,qm S Z[i]'^^ y^^ divide n. Suppose that w is sufficiently 
large depending on v. Then to^qi, . . . , (?,„) = 1 if qi ■ ■ ■ q-m is a Gaussian unit, and 
u>{qi, . . . , qm) = ^/n if qi . . .qm is a Gaussian prime. In all other cases, we have 
u){qi, . . . ,qm) = 0(l/n)l„|A, where A was defined in (59). 
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Proof In the first two cases (when qi . . . qm is a Gaussian unit or a Gaussian 
prime), the claim follows just as in Lemma 10.2, noting that Z[i]/pZ[i] is cyclic of 
prime order n whenever N(p) = n, and that W and v are invertible in Z[i]/pZ[i]. 

Now suppose that qi . . . qm if' the product of at least two primes. Then from the 
preceding discussion we certainly have co{qi,. . . ,qm) < by discarding all but 
one of the constraints qj\W{hj +nv) + b. This settles the claim when n divides A, so 
now suppose that n does not divide A. This implies in particular that the hj are all 
distinct in Z[i]/pZ[i] for any Gaussian prime p dividing n. Since W and v arc also 
invertible in Z[i]/pZ[i], this means any two constraints of the {ovm p\W{hj +nv) + b 
and p\W{hj' + nv) + b cannot simultaneously be true for any distinct j,j'. In a 
similar spirit, since A is non-zero in Z[i]/pZ[i] for any p dividing n, we see that 
W{hjV — hj'v) — bv + bv is similarly non-zero for any 1 < j < j' < m. A little 
algebra then shows that the constraints p\\V(hj + nv) + b and p\W{hji + nv) + b 
cannot simultaneously be true. Combining all these facts together, we see that the 
constraints qj\W{hj + nv) + b cannot be simultaneously satisfied for 1 < j < m, 
anmd u){qi,. . . ,qm) vanishes as claimed. ■ 



As a particular corollary we obtain the analogue of Lemma 10.3: 
Lemma 11.3. If di,.. . , dm, rf'i, . • . ,d'^€ P[i]w> ^^^'"^ 

1 



,dm,d'i,... < 



[N(rfi),... ,N(d„),NK),... Md'rn)]' 



The proof is the same as that of Lemma 10.3 and is omitted. 

We return to the proof of (76). Once again, we use the expansion (70) of e^y(a;), 
and obtain the expansion 

« ^ m 

' =fT^(l^^im)^(!:^^) 

fj^ log N{R) '^^ log N(i?) ' 

/ m ^ 

+ OA,^,m ( (11 djd'j)-y'°^^(^'> log-^ N{R) 

where / is the interval I := {t G R : \t\ < log^^^ N(i?)}, tp is rapidly decreasing, 
and ^ > is arbitrary. This allows us to write the left-hand side of (76) as a main 
term 

N(d,)-(i+^*^)/'°sN(ii)N(rf^)-(i+»t;)/iogN(ii) ^(t.)^(t'.)dtfdt] 



56 



TERENCE TAO 



plus an error term 



E 

ii,... ,dm,d'i,... ,d'^e'Ziliy^ 



|di . . . . . . d;„|i/iogN(K) logA 



The error term is treated exactly as with (72), so we turn to treating the main term 
(77). Our task is to estimate this term by 



N{W) 



>Zm(W^)log 



)7n 
n (i+o„(N(p)-v^)) 



peP[i];:p|A 

Using Lemma 11.1, we can rewrite (77) as 



where 



16- /••• / K{tu... ,tmA,--- ,t'm) T\i^{tjMt'j)dtjdt'j 



K{h,... ,tm,t[,... ,t'J:= n J2 

neN{Pli\'):n>w di,... M„^,d[,... ,d'^eZ[i\^_^'> 

m 

oj{[di,d'i], ... , [dm, d'^]) Jl MZ[i] idj)fJ.Z[i\ (d'j) 

N(dj)-(i+'*j)/i°sN(fl)p^(-^/^-(i+itJ)/iogN(ij)_ 



(78) 



(79) 



Now we control the local factor, in complete analogy with Lemma 10.4. 
Lemma 11.4. Let n e N(P[i]') be such that n>w. Then the expression 

^ uj{{[di,d'-^\,... ,[dm,d'm\)ses) 

di,... ,d„,cii,... ,d;,GZ[i]<"'VseS 
m 

n/^Z[.]('^.)MZ[.](rf;)N(ci,)-(^+^*^)/^-N(^)N«)-(i+'*^)/'-N(«) (80) 

i=i 

is equal to \ + Omi^/n) if n divides A, and is equal to 



■'■=lpeP[i];:N(p)=n 



(1 - N(p)^i-(i+".')/'°8^(^))(l - N(p)"i"(i+'*j)/'°sN(fl)^ 
1 - N(p)"^"(^+**^+'*^)/'°*5^(^^ 



otherwise,. 



Proof If n divides A, then the claim follows from Lemma 11.3, so suppose that n 
does not divide A. But then the claim follows by exact repetition of the proof of 
Lemma 10.4. ■ 
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From the above lemma we sec that 



K(ti, ... ,tmiti, . . . , tjj^) —Or, 



n (l + 0„(l/n)) 

^eN(P[i]'):«>«^,n|A 

Czm;,„vf(1 + (2 + + logN(i?)) 
CZ[.]' ,v^(l + (l + ^^,)/logN(i^)Xz[,], ,^(i + (i + zi;.)/iogN(i?)) 

J i- y i sq' LJsq' 

where Cz[i]' defined in (75). Applying Lemma 10.5, we conclude 



n 



K{ti, . . . , tm, t'l, ■ . ■ , t'rn) — O; 



n (l + 0„(l/n)) 

nGN(P[i]'):n>tu,"|A 



Inserting this into (79) and using the rapid decay of tp, wo can thus bound (79) by 



Om{[ 



n 



(l + 0„(l/n))] 



N(P[ 



</'Zm(W^)™log"N(ii) 



) 



which is bounded by (78) as desired. This concludes the proof of Proposition 9.1 
and hence Theorem 1.2. ■ 



12. Discussion 



The proof of Theorem 1.2 also gives a little bit more, namely that any subset of the 
Gaussian primes P[i] of positive relative density will contain infinitely many con- 
stellations of a prescribed shape, but we have chosen not to give this generalization 
in order to simplify the exposition slightly. 

Our method is also likely to extend to other number fields than the Gaussian 
integers, at least if one has unique factorization. It is also likely that a relative 
version of Theorem 1.1 exists, in which the set A is a dense subset of the set P'^ - 
the set of lattice points in 7/^ with prime coefficients - rather than Z'^. However, a 
technical problem arises when working with P'', namely that P'^ (or any majorant 
of P'^) generates significant correlations between certain elements a + rvj of the 
constellation, even after removing obstructions coming from small divisors. For 
instance, if a + r(l, 0) and a + r(0, 1) both lie in P^, then a itself necessarily also 
lies in P^. This issue means that the entire approach to this problem, based on 
viewing P'^ as a dense subset of some siiitably pseudorandom set, needs to be 
somehow modified, unless one is working in a case when these correlations do not 
appear (for instance, if all the i*^ co-ordinate of the vj are distinct in j for each i). 
However even in such a model case there appear to be some non-trivial technical 
difficulties, most notably in obtaining the dual function condition (Definition 2.7). 
We will not pursue these matters here. 
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